Mecp2 based therapy

ABSTRACT

MeCP2 based therapy. The present invention relates to synthetic polypeptides that are useful in the treatment of disorders associated with reduced MeCP2 activity, including Rett syndrome. The present invention provides synthetic polypeptides comprising: i) an MBD amino acid sequence showing at least 70% similarity with the amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2, wherein the polypeptide has a deletion of at least 50 amino acids, when compared to the full length MeCP2 e1 and e2 sequences. The invention further provides nucleic acid constructs, expression vectors, virions, pharmaceutical compositions, and cells providing polynucleotides of the invention. The invention further provides methods of treating or preventing disease in an animal comprising administering to said animal a synthetic polypeptide according to the invention.

FIELD OF THE INVENTION

The present invention relates to synthetic polypeptides that are usefulin the treatment of disorders associated with reduced MeCP2 activity,including Rett syndrome. The invention also relates to nucleic acidconstructs, expression vectors, virions and cells for expressing thesynthetic polypeptides. Further, the invention concerns methods oftreating disorders, such as Rett syndrome, using the syntheticpolypeptides of the invention, the use of the synthetic polypeptides,nucleic acid constructs, expression vectors, virions and cells in themanufacture of medicaments for the treatment of disorders associatedwith reduced MeCP2 activity, including Rett syndrome, and pharmaceuticalcompositions comprising the synthetic polypeptides, nucleic acidconstructs, expression vectors, and virions of the invention.

BACKGROUND TO THE INVENTION

Methyl CpG-binding Protein 2 (MeCP2) is a nuclear protein that was namedfor its ability to preferentially bind methylated DNA. Interest in MeCP2increased when mutations in the MECP2 gene were identified in themajority of Rett syndrome patients.

Rett syndrome occurs in about 1 in 15,000 girls. Although it is intheory a rare disease because it affects fewer than 1 in 2000individuals, it is actually one of the most common genetic causes ofintellectual disability in women. It was originally considered to be aneurodevelopmental disorder, due to the decreased, arrested and retardeddevelopment of those with the disorder from the age of about 6 months.However, the fact that some of the main symptoms have been found to bereversible in a mouse model of the disease means that it is nowgenerally considered to be a neurological disorder.

The MECP2 gene is located on the X chromosome. It spans 76 kb and iscomposed of four exons. The MeCP2 protein has two isoforms, MeCP2 e1 andMeCP2 e2, which differ at the N-terminus of the protein. The isoform e1is made up of 498 amino acids and isoform e2 is 486 amino acids long.The MECP2 (human) and Mecp2 (mouse) genes consist of four exons andundergo alternative splicing to produce the two mRNA species: e1consists of exons 1,3 and 4; and e2 consists of 1,2,3 and 4. Translationstarts from exon 1 or 2 in isoforms e1 and e2, respectively. Since thevast majority of the coding sequence is in exons 3 and 4, these twoisoforms are very similar and only differ at the extreme N-termini. ThemRNA of the MECP2 e1 variant has greater expression in the brain thanthat of the MECP2 e2 and the e1 protein is more abundant in the mouseand human brain. MeCP2 is an abundant mammalian protein that selectivelybinds 5-methyl cytosine residues in symmetrically methylated mCpGdinucleotides and asymmetrically methylated mCpA dinucleotides. CpGdinucleotides are preferentially located in the promoter regions ofgenes, but these are mostly unmethylated. In comparison, it is the CpGdinucleotides in the bulk genome that are highly methylated and it is tothese that MeCP2 binds. The presence of mCpA methylated in neuronsfurther increases the number of binding sites. In this way, MeCP2regulates gene transcription by binding in the main body of genesequences¹.

MeCP2 is highly conserved across vertebrates, and at least sixbiochemically distinct domains have been identified in the protein²,including High Mobility Group Protein-like Domains, the Methyl BindingDomain (MBD), the Transcriptional Repression Domain comprising theNCoR/SMRT Interaction Domain (NID), and the C-terminal domains α and β.Functionally, MeCP2 has been implicated in several cellular processesbased on its reported interaction with >40 binding partners³, includingtranscriptional co-repressors⁴ (e.g. NCoR/SMRT⁵), transcriptionalactivators⁶, RNA⁷, chromatin remodellers^(8,9), microRNA-processingproteins¹⁰, and splicing factors¹¹. Accordingly, MeCP2 has been cast asa multi-functional hub that integrates diverse functions that areessential in mature neurons¹².

There are currently no treatments available that are specific for Rettsyndrome. Instead, treatment generally involves treating the symptoms ofthe disease using traditional drugs, whilst preventative strategiesinvolve aggressive nutritional management, prevention ofgastrointestinal and orthopedic complications, and rehabilitationtherapies. Thus there remains a pressing need for a means ofspecifically treating and preventing the development of Rett syndrome.

SUMMARY OF THE INVENTION

Accordingly, the present invention provides synthetic polypeptidescomprising an MBD amino acid sequence showing at least 70% similaritywith the amino acid sequence as depicted in SEQ ID NO: 1 and an NIDamino acid sequence showing at least 70% similarity with the amino acidsequence as depicted in SEQ ID NO: 2.

The synthetic polypeptides of the invention may have a deletion orsubstitution of at least 50 amino acids compared to the full lengthMeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Additionally, oralternatively, the synthetic polypeptides of the invention have lessthan 90% identity over the entire length of the amino acid sequence ofMeCP2 as depicted in SEQ ID NO: 3 and/or SEQ ID NO: 4.

Generally, the synthetic polypeptides of the invention will comprise MBDand NID domains in accordance with MeCP2, but will be lacking otherparts of the natural sequence of MeCP2.

Thus any deletion or substitution in the synthetic polypeptide may be ofa part of the natural sequence of MeCP2, but not of a part of the MBD orNID of MeCP2. Thus the synthetic polypeptides of the invention maygenerally have the structure

-   -   A-B-C-D-E

wherein portion B of the synthetic polypeptide is the MBD amino acidsequence showing at least 70% similarity with the amino acid sequence asdepicted in SEQ ID NO: 1, and portion D of the synthetic polypeptide isthe NID amino acid sequence showing at least 70% similarity with theamino acid sequence as depicted in SEQ ID NO: 2, and further wherein atleast one of the three following are true: portion A of the syntheticpolypeptide is less than 30 amino acids long and/or has less than 90%identity to the amino acid sequences as depicted in SEQ ID NOs: 5 and 6,calculated over the entire length of the amino acid sequences asdepicted in SEQ ID NO: 5 and 6; portion C of the synthetic polypeptideis less than 20 amino acids long and/or has less than 90% identity tothe amino acid sequence as depicted in SEQ ID NO: 7, calculated over theentire length of the amino acid sequence as depicted in SEQ ID NO: 7;and portion E of the synthetic polypeptide is absent, a polypeptide tag,and/or has less than 90% identity to the amino acid sequence as depictedin SEQ ID NO: 8, calculated over the entire length of the amino acidsequence as depicted in SEQ ID NO: 8. The skilled person will appreciatethat this general structure is disclosed from left to right in theaccepted N-terminal to C-terminal direction of the syntheticpolypeptide.

The invention also provides nucleic acid constructs encoding a syntheticpolypeptide of the invention, and expression vectors comprising anucleotide sequence encoding a synthetic polypeptide of the invention.The expression vector may be a viral vector, and thus the invention alsoprovides a virion comprising an expression vector according to theinvention. The invention also provides cells that comprise a syntheticgenetic construct adapted to express a polypeptide of the invention,cells comprising a vector of the invention, and cells for producing avirion of the invention.

The invention also provides pharmaceutical compositions comprising thesynthetic polypeptides, nucleic acids constructs, expression vectorsand/or virions of the invention.

The synthetic polypeptides of the invention have utility in medicine,and particularly in the treatment of neurological disorders associatedwith inactivation, such as an inactivating mutation, of MECP2. Suchdisorders include Rett syndrome. Therefore the invention provides amethod of treating or preventing disease in an animal comprisingadministering to said animal a synthetic polypeptide of the invention.Said administering may comprise administering a synthetic polypeptide ofthe invention, an expression vector of the invention, a virion of theinvention and/or a pharmaceutical composition of the invention.

Furthermore, the invention provides synthetic polypeptides of theinvention, expression vectors of the invention, and virions of theinvention for the treatment or prevention of a neurological disorderassociated with inactivating mutation of MECP2, for example Rettsyndrome. The invention also provides the use of synthetic polypeptidesof the invention, expression vectors of the invention, and virions ofthe invention in the manufacture of a medicament for the treatment orprevention of a neurological disorder associated with inactivatingmutation of MECP2, for example Rett syndrome.

DETAILED DESCRIPTION

The synthetic polypeptides of the invention provide therapeutic proteinsthat can be used in the treatment of disorders that are caused byinactivation or reduced activity of MeCP2. Such disorders include, inparticular, Rett syndrome. The nucleic acid constructs, expressionvectors, virions and cells of the invention can be used to produce and,optionally, deliver the synthetic polypeptides. The invention alsoprovides methods of treatment using the products of the invention, andthe use of those products in those treatments.

The invention is based on the inventors' surprising and unexpectedfinding that it is a deficiency in the biological activity associatedwith the MBD and NID of MeCP2 that is key to the development of Rettsyndrome, such that other parts of the protein are not necessary,despite the fact that MeCP2 is a highly conserved protein. Theygenerated and tested the hypothesis that the MeCP2 functions that arevital in Rett syndrome are those due to MeCP2 forming a bridge betweenchromatin and the NCoR/SMRT complex, so that all other domains of MeCP2are dispensable. Furthermore, they hypothesised that the Rett syndromemutations occurring within the MBD or NID domains interfere directlywith this function, and that Rett syndrome mutations occurring outsidethe MBD and NID either destabilise the protein generally or specificallyimpair the bridge between the chromatin and the NCoR/SMRT complex. As aresult of their studies and understanding, they have concluded that theMBD and the NID are therefore sufficient for the MeCP2 function requiredto treat or prevent Rett syndrome and similar disorders. This means thatthey are able to provide a “mini-MeCP2” protein derivative byjettisoning a significant portion, for example in some embodiments up to50-65%, of the native MeCP2 protein.

As discussed elsewhere in the specification, the inventors' conclusionmeans that therapeutic synthetic polypeptides can be prepared that areconsiderably smaller than the full length MeCP2. Of course, this meansthat they can be easier to produce and effectively deliver to patients.For example, some delivery vehicles, such as adeno-associated viral(AAV) vectors, are restricted as to the amount of payload they cancarry, so the ability to lighten that load by encoding a smallerpolypeptide is advantageous. Also, the removal of unnecessary parts ofthe MeCP2 polypeptide means that alternative polypeptide sequence, suchas peptide tags, regulatory tags and/or signaling peptides can beinserted in the polypeptide in some embodiments without making thepolypeptide overly large. Similarly, the smaller protein means that asmaller nucleic acid sequence can encode the protein, such that whenthere are size constraints on the amount of nucleic acid sequence thatcan be included in a particular construct in some embodiments of theinvention, the constructs of that type that encode the polypeptides ofthe invention may include additional sequences, such as regulatoryelements, that would be difficult to include if the full-length MeCP2protein was being encoded instead of the smaller polypeptide, due to thesize constraints. Furthermore, the removal of other biologically active,but unnecessary, parts of the MeCP2 protein means that there may be lesschance of unwanted side-effects due to interactions of those parts ofthe protein during the therapeutic or preventative treatment.

Thus the invention provides improved methods of treating or preventingdisorders associated with reduced MeCP2 activity, such as Rett syndrome,as well as therapeutic products for use in those methods.

Furthermore, the inventors have surprisingly and unexpectedly found thatalthough deletion of the part of MeCP2 that links the MBD and NIDdomains appears to reduce the stability of the synthetic polypeptidehaving the MBD and NID domains, this reduced stability can have abeneficial effect as it reduces the toxicity that can be associated withover-dosing of subjects with a MeCP2 polypeptide. Thus the inventionalso provides in some embodiments of the invention improved methods thatare safer and less likely to be associated with toxic side effects, aswell as the therapeutic products for use in those methods, which providesynthetic polypeptides lacking at least part of the amino acid sequencethat links the MBD and NID in MeCP2.

In order to assist the understanding of the present invention, certainterms used herein will now be further defined, and more generallyfurther details of the invention will be given, in the followingparagraphs.

Synthetic Polypeptides

The invention provides synthetic polypeptides comprising an MBD aminoacid sequence and an NID amino acid sequence.

As used herein, the term “polypeptide” can be used interchangeably with“peptide” or “protein”, and means at least two covalently attached alphaamino acid residues linked by a peptidyl bond. The term polypeptideencompasses purified natural products, or chemical products, which maybe produced partially or wholly using recombinant or synthetictechniques. The term polypeptide may refer to a complex of more than onepolypeptide, such as a dimer or other multimer, a fusion protein, aprotein variant, or derivative thereof. The term also includes modifiedproteins, for example, a protein modified by glycosylation, acetylation,phosphorylation, pegylation, ubiquitination, and so forth. A polypeptidemay comprise amino acids not encoded by a nucleic acid codon.

As used herein, the term “synthetic polypeptide” refers to polypeptidesequences formed by processes through human agency. The syntheticpolypeptides of the invention are based on MeCP2 in that they havebiologically active MBD and NID sequences, such as those that occur inwild type MeCP2 proteins, but are distinguished from the naturallyoccurring MeCP2 proteins. The polypeptides of the invention aresynthetic because they include mutations, such as amino acid deletions,substitutions and/or insertions, in the wild type MeCP2 sequences suchthat the resultant synthetic polypeptides are not known from the art asnatural polypeptides.

“Naturally occurring,” “native,” or “wild-type” is used to describe anobject that can be found in nature as distinct from being artificiallyproduced. For example, a protein or nucleotide sequence present in anorganism (including a virus), which can be isolated from a source innature and that has not been intentionally modified by a person in thelaboratory, is naturally occurring.

The Methyl-CpG Binding Domain (MBD) of MeCP2 has the ability to bindmethylated DNA. The human MeCP2 MBD sequence is provided herein as SEQID NO: 1. It consists of amino acids 72 to 173, inclusive, of the humanMeCP2 protein (numbering refers to the e2 isoform, i.e. as in SEQ ID NO:4). The mouse MeCP2 MBD sequence is identical to the human sequence. Thepolypeptides of the invention comprise an MBD having at least 70%similarity to this MeCP2 MBD sequence (SEQ ID NO: 1). Preferably thepolypeptides of the invention comprise an MBD having at least 70%, 75%,80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99% similarity tothe human MeCP2 MBD sequence (SEQ ID NO:1). Further preferably thepolypeptides of the invention comprise an MBD having at least 90%similarity. Most preferably the polypeptides of the invention comprisethe human MeCP2 MBD sequence (SEQ ID NO:1). The MBD sequences of thesynthetic polypeptides of the invention have the ability to bindmethylated DNA.

The MBD sequence of particular interest for the synthetic polypeptidesof the invention is that of the amino acids at positions 78 to 162 ofthe MeCP2 e2 isoform (SEQ ID NO: 4). Thus in preferred embodiments ofthe invention the polypeptides of the invention comprise an MBD havingat least 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99% similarityto the sequence of amino acids from positions 78 to 162 of the MeCP2 e2isoform (SEQ ID NO: 4).

Most preferably MBD amino acid sequences of the polypeptides of theinvention comprise the sequence of amino acids from positions 78 to 162of the MeCP2 e2 isoform (SEQ ID NO: 4).

The MBD sequence of MeCP2 includes several phosphorylation sites (Ser80,Ser86, Thr148/9 and Ser164; numbering with respect to the e2 isoform).Phosphorylation at Ser80 and Ser164, at least, has been associated withaffecting the activity of MeCP2. Therefore it is preferred that one,more, or all of these amino acids are retained in the MBD sequences ofthe synthetic polypeptides of the invention.

The MBD sequence of the invention may correspond to that of a naturallyoccurring MeCP2 MBD sequence, for example the sequence of MBD in thezebrafish homolog of MeCP2.

The NCoR/SMRT Interaction Domain (NID) of MeCP2 is the domain throughwhich MeCP2 interacts with the NCoR/SMRT co-repressor complexes. Thehuman MeCP2 NID sequence is provided herein as SEQ ID NO: 2. It consistsof amino acids 272 to 312, inclusive, of the human MeCP2 protein(numbering refers to the e2 isoform, i.e. as in SEQ ID NO: 4). The mouseMeCP2 MBD sequence is identical to the human sequence, except for aminoacid position 297 in SEQ ID NO: 4 (i.e. the amino acid at position 26 inSEQ ID NO: 2), which is histidine in mouse but glutamine in human. Thepolypeptides of the invention comprise an NID amino acid sequence havingat least 70% similarity to this MeCP2 NID sequence (SEQ ID NO: 2).Preferably the polypeptides of the invention comprise an NID having atleast 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99%similarity to the human MeCP2 NID sequence (SEQ ID NO: 2). Furtherpreferably the polypeptides of the invention comprise an NID having atleast 90% similarity. Most preferably the polypeptides of the inventioncomprise the human MeCP2 NID sequence (SEQ ID NO: 2). The NID sequencesof the synthetic polypeptides of the invention have the ability tointeract, or bind, with the NCoR/SMRT co-repressor complex.

The NID sequence of particular interest for the synthetic polypeptidesof the invention is that of the amino acids at positions 298 to 309 ofthe MeCP2 e2 isoform (SEQ ID NO: 4). Thus in preferred embodiments ofthe invention the polypeptides of the invention comprise an MBD havingat least 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99%similarity to the sequence of amino acids from positions 298 to 309 ofthe MeCP2 e2 isoform (SEQ ID NO: 4). Most preferably NID amino acidsequences of the polypeptides of the invention comprise the sequence ofamino acids from positions 298 to 309 of the MeCP2 e2 isoform (SEQ IDNO: 4).

The NID sequence of MeCP2 includes phosphorylation sites (Thr308 andSer274; numbering with respect to the e2 isoform), the former of whichhas been associated with affecting the activity of the NID. Therefore itis preferred that Thr308 at least is retained in the NID sequences ofthe synthetic polypeptides of the invention.

The NID sequence of the invention may correspond to that of a naturallyoccurring MeCP2 NID sequence, for example the sequence of NID in thezebrafish homolog of MeCP2.The MBD and NID sequences may have the sameamount of percentage similarity to their respective wild type humanMeCP2 sequences, or they may have different amounts of percentagesimilarity to their respective wild type human MeCP2 sequences. Thepercentage similarities for the MBD and NID may therefore consist of anycombination of the above disclosed percentage similarities. Thus thepresent invention provides synthetic polypeptides comprising an MBDamino acid sequence showing at least 70% similarity with the amino acidsequence as depicted in SEQ ID NO: 1 and an NID amino acid sequenceshowing at least 70% similarity with the amino acid sequence as depictedin SEQ ID NO: 2, but preferably the MBD and NID amino acid sequences mayhave at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or99% similarity to the human MeCP2 domain sequences. Further preferablyat least 90% similarity. Similarly, the MBD sequence may have at least70%, 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99%similarity to the human MeCP2 MBD sequence whilst the NID sequence ofthe same synthetic polypeptide may have at least 70%, 75%, 80%, 85%,88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99% similarity to the humanMeCP2 NID sequence. Preferably one or both of the MBD and NID sequenceswill consist of or comprise their corresponding human or mouse domainsequence.

The term “similarity” refers to a degree of similarity between proteinsor polypeptide sequences taking into account differences in amino acidsat aligned positions of the sequences, but in which the functionalsimilarity of the different amino acid residues, in view of almost equalsize, lipophilicity, acidity, etc., is also taken into account. Apercentage similarity can be calculated by optimal alignment of thesequences using a similarity-scoring matrix such as the Blosum62 matrixdescribed in Henikoff S. and Henikoff J. G., P.N.A.S. USA 1992, 89:10915-10919. Calculation of the percentage similarity and optimalalignment of two sequences using the Blosum62 similarity matrix and thealgorithm of Needleman and Wunsch (J. Mol. Biol. 1970, 48: 443-453) canbe performed using the GAP program of the Genetics Computer Group (GCG,Madison, Wis., USA) using the default parameters of the program.

Exemplary parameters for amino acid comparisons for similarity in thepresent invention use the Blosum62 matrix (Henikoff and Henikoff, supra)in association with the following settings for the GAP program:

-   -   Gap penalty: 8    -   Gap length penalty: 2    -   No penalty for end gaps.

Functional polymorphic forms of MBD and NID from mice and humans, andhomologues of these domains from MeCP2 of other species, may be includedin the polypeptides of the present invention. Variants of these domainsin the polypeptides that also form part of the present invention arenatural or synthetic variants that may contain variations in the aminoacid sequence due to deletions, substitutions, insertions, inversions oradditions of one or more amino acids in said sequence or due to analteration to a moiety chemically linked to a protein. For example, aprotein variant may be an altered carbohydrate or PEG structure attachedto a protein. The polypeptides of the invention may include at least onesuch protein modification.

“Variants” of a polypeptide domain or protein, as used herein, refers toa polypeptide domain or protein resulting when a polypeptide is modifiedby one or more amino acids (e.g. insertion, deletion or substitution),or which comprises a protein modification, or which contains modified ornon-natural amino acids. Substitutional variants of polypeptides arethose in which at least one residue in the amino acid sequence has beenremoved and a different residue inserted in its place. The domains inthe polypeptides of the present invention can contain conservativechanges, wherein a substituted amino acid has similar structural orchemical properties, or more rarely non-conservative substitutions, forexample, replacement of a glycine with a tryptophan, as long as thedomains retain function. Variants may also include sequences with aminoacid deletions or insertions, or both. Guidance in determining whichamino acid residues may be substituted, inserted, or deleted withoutabolishing biological or immunological activity may be found usingcomputer programs well known in the art.

The term “conservative substitution”, relates to the substitution of oneor more amino acid substitutions for amino acid residues having similarbiochemical properties. Typically, conservative substitutions havelittle or no impact on the activity of a resulting polypeptide sequence.For example, a conservative substitution in a binding domain may be anamino acid substitution that does not substantially affect the abilityof the domain to bind to its binding partner(s) or otherwise perform itsusual biological function. Screening of variants of the polypeptidedomains of the present invention can be used to identify which aminoacid residues can tolerate an amino acid substitution. In one example,the relevant biological activity of a polypeptide having a modifieddomain is not altered by more than 25%, preferably not more than 20%,especially not more than 10%, when one or more conservative amino acidsubstitutions are effected.

One or more conservative substitutions can be included in a MBD or NIDof a polypeptide of the present invention. In one example, 10 or fewerconservative substitutions are included in the domains. A polypeptide ofthe invention may therefore include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ormore conservative substitutions of the MBD and/or NID domains. Apolypeptide can be produced to contain one or more conservativesubstitutions by manipulating the nucleotide sequence that encodes thatpolypeptide using, for example, standard procedures such assite-directed mutagenesis or PCR. Alternatively, a polypeptide can beproduced to contain one or more conservative substitutions by usingpeptide synthesis methods, for example as known in the art.

Examples of amino acids which may be substituted for an original aminoacid in a protein and which are regarded as conservative substitutionsinclude: Ser for Ala; Lys for Arg; Gln or His for Asn; Glu for Asp; Asnfor Gln; Asp for Glu; Pro for Gly; Asn or Gln for His; Leu or Val forIle; Ile or Val for Leu; Arg or Gln for Lys; Leu or Ile for Met; Met,Leu or Tyr for Phe; Thr for Ser; Ser for Thr; Tyr for Trp; Trp or Phefor Tyr; and Ile or Leu for Val. In one embodiment, the substitutionsare among Ala, Val, Leu and Ile; among Ser and Thr; among Asp and Glu;among Asn and Gln; among Lys and Arg; and/or among Phe and Tyr. However,a substitution may not be considered conservative where it results inthe removal of a site of phosphorylation within the polypeptidesequence. Further information about conservative substitutions can befound in, among other locations, Ben-Bassat et al., (J. Bacteriol.169:751-7, 1987), O'Regan et al., (Gene 77:237-51, 1989), Sahin-Toth etal., (Protein Sci. 3:240-7, 1994), Hochuli et al., (Bio/Technology6:1321-5, 1988), WO 00/67796 (Curd et al.) and in standard textbooks ofgenetics and molecular biology.

Substitutions causing loss or decrease of function of MBD and NID areknown in the art, not least due to the association of some with Rettsyndrome and other MeCP2 associated disorders. Examples of such harmfulchanges or mutations include those shown in Table 1 and FIG. 2C in theMBD, and those shown in Table 2 and FIG. 2D in NID. Thus the skilledperson will understand that these harmful changes should not be includedin the MBD and NID domains of the polypeptides of the invention.

TABLE 1 Harmful and benign amino acid changes in the MBD. According toconvention, all amino acid numbers given in the following refer to thehuman and mouse e2 iosforms. *those found in hemizygous males are inbold, and those in heterozygous females are in italics; **those found innon-mammalian vertebrates are provided in italics. The “benign changes”listed below are not known to be associated with an MeCP2-associateddisorder, therefore they are probably benign. Harmful changes associatedwith: Benign changes present in: Classical Atypical RTT or other GeneralOther [vertebrate] RTT¹³ intellectual disability¹³ population¹⁴*species** L100V S86C ♀ P72L /S P72L L100R P93S ♀ P75L A73S P101R D97Y/E♀ K82R V74A P101S P101R ♀ R85H A77S P101H G103D ♀ R89C P81A P101L W104R♀ R91W I87V R106W G114A ♀ S113F D96E R106Q Y120D ♀ R115H T99S R106LD121G ♀ Q128E E102Q L108H V122M ♀ A140G Y120F R111G N126S ♀ K144RQ128N/E L124F G129V ♀ D147E /N I139M P127L R133H/G ♀ T160S E143Q Q128PE137G ♂s only K171N S149I A131D A140V ♀s + ♂s P172L L150T R133C Y141C ♀P173A Q170K R133P D151G ♀ K171R R133L P152A ♀s + ♂s P172Q S134C F155C ♀S134F D156G ♀ S134P T160S ♀s + ♂s K135E G161E/W L138S P172S ♂s onlyP152R F155S D156E D156A F157L F157I T158M T158A G161V

Tables 1 and 2 also list changes that have no known association with anMECP2-related disorder, and so that are believed to be benign. Thus theskilled person will understand that such apparently benign changes mayoptionally be included in synthetic polypeptides of the invention.

TABLE 2 Harmful and benign amino acid changes in the NID. According toconvention, all amino acid numbers given in the following refer to thehuman and mouse e2 isoforms. *those found in hemizygous males are inbold, and those in heterozygous females are in italics; **those found inmouse are in bold, and those found in non-mammalian vertebrates areprovided in italics. Harmful changes associated with: Benign changespresent in: Classical Atypical RTT or other General Present in otherRTT¹³ intellectual disability¹³ population¹⁴* [vertebrate] species P302TK286R ♀ P272L G273S/A P302S V300I ♀ A278T /V S274A P302H I303M ♀ A279S/PV275A/L P302L K304E/R ♀ A291T V276A P302R R309W ♀s + ♂s A287V/P A278IK305R T311M ♀ V288M A279L K305T R294Q/P/ /G A280T R306C S295T A281ER306H T311A E282A A288I/L I293A R294K S295P V296L Q297H(mouse)/L T299RV300A V312I/L

The biological activity of the MBD and NID domains that is of particularinterest for the invention is the ability to recruit members of theNCoR/SMRT co-repressor complex to methylated DNA. Therefore it ispreferred that the synthetic polypeptides of the invention are capableof recruiting NCoR/SMRT co-repressor complex components to methylatedDNA. The NCoR/SMRT co-repressor complex components include NCoR, HDAC3,SIN3A, GPS2, SMRT, TBL1X and TBLR1. Preferably the syntheticpolypeptides are capable of recruiting TBL1X or TBLR1 to methylated DNA.

The inventors have identified the MBD and NID domains as being key tothe activity that is required of therapeutic MeCP2 in order for it tocompensate for the reduced activity of MeCP2 in Rett syndrome andrelated disorders. Therefore whilst it is required that the MBD and NIDdomains are biologically active in the synthetic polypeptides of theinvention, amino acid sequences in other parts of the wild type MeCP2protein may be altered, for example by deletion of amino acids. Thus thesynthetic polypeptides of the invention may have a deletion of at least50 amino acids when compared to the full length human MeCP2 e1 and e2sequences (SEQ ID NOs 3 and 4). Said deletion of at least 50 amino acidsmay be a deletion of at least 60, 70, 80, 90, 100, 125, 150, 175, 200,225, 250, 275, or 300 amino acids when compared to the full length humanMeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Preferably said deletionis of at least 200 amino acids when compared to the full length humanMeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4).Such deletion of at least50 amino acids may be assessed by preparing an alignment (see above) ofthe amino acid sequence of interest with the human MeCP2 e1 and e2sequences (SEQ ID NOs 3 and 4). This will identify any regions in whichamino acid residues in the MeCP2 sequences have been deleted becausethere will be gaps in the sequence of interest aligned to the MeCP2 e1and e2 sequences (SEQ ID NOs 3 and 4). Preferably at least some of theamino acids that have been deleted will be consecutive within the MeCP2sequence, such that said deletion of at least 50 amino acids willinclude the deletion of at least 5, 10, 15, 20, 30, 40, 50 or moreconsecutive amino acids within the MeCP2 e1 and e2 sequences (SEQ ID NOs3 and 4). The deletion of the at least 50 amino acids will be apparentfrom the alignment with both the e1 and e2 sequences, therefore anydeletion that is present in the N-terminal region of MeCP2 and that isonly associated with one of the e1 and e2 sequences, should not beconsidered a deletion to be counted as part of the at least 50consecutive amino acids deleted in accordance with the invention.However, a deletion that is present in the N-terminal region of MeCP2and that is present in both the e1 and e2 sequence alignments should beconsidered a deletion and counted as part of the at least 50 deletedamino acids in accordance with the invention.

The amino acids deleted from the wild-type MeCP2 e1 and e2 sequences maybe replaced with some other useful amino acid sequence. For example, adeletion of at least 50 amino acids may have occurred when compared tothe full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4),but those deleted amino acids may have been replaced, at least in part,with amino acid sequence providing a linker, a tag and/or a signalingpeptide.

This may be identified in an alignment of the synthetic polypeptide withthe MeCP2 e1 and e2 sequences by a stretch of amino acid sequence in thesynthetic polypeptide that does not match the MeCP2 sequence, andwherein that stretch of unmatched amino acid sequence corresponds to auseful, or purposive, heterologous sequence. Thus the inventionprovides, in at least some embodiments, synthetic polypeptides that haveMeCP2 activity associated with the MBD and NID sequences, but that canalso include useful heterologous sequences without requiring thesynthetic polypeptides to be larger than the wild type MeCP2 protein;since large parts of the MeCP2 sequence can be left out of the syntheticpolypeptides of the invention, the heterologous sequence(s) can beincluded whilst maintaining a relatively small overall size for thesynthetic polypeptide.

As an alternative to the above-mentioned deletion of at least 50 aminoacids, or in addition to it, the synthetic polypeptide of the inventionhaving the MBD and NID amino acid sequences may have alterations to thepolypeptide amino acid sequences such that it has less than 90% identityto the amino acid sequences of MeCP2, as depicted in SEQ ID NOs: 3 and4, over the entire length of the amino acid sequences of MeCP2, asdepicted in SEQ ID NOs: 3 and 4. Said less than 90% identity will beapparent from the comparison with both the e1 and e2 sequences,therefore any such identity solely due to alterations in the N-terminalregion of MeCP2, which are only associated with one of the e1 and e2sequences, will not be considered as the synthetic polypeptide havingless than 90% identity in accordance with the invention. Preferably saididentity will be less than 85%, 80%, 75%, 70%, 65%, 60% or 55%. It isparticularly preferred that said identity will be less than 60%.

Synthetic polypeptides of the invention will generally have thestructure:

-   -   A-B-C-D-E

wherein portion B of the synthetic polypeptide is the MBD amino acidsequence, as described above, and portion D of the synthetic polypeptideis said NID amino acid sequence, as described above. As explained above,however, parts of the synthetic polypeptide other than the MBD and NIDdomains, i.e. portions A, C and D, may have alterations compared to thewild type MeCP2 sequence. Thus the synthetic polypeptide may includealterations in accordance with one or more of the following: portion Aof the synthetic polypeptide is less than 40 amino acids long and/or hasless than 95% identity to the amino acid sequences as depicted in SEQ IDNOs:5 and 6, calculated over the entire length of the amino acidsequences as depicted in SEQ ID NOs: 5 and 6; portion C of the syntheticpolypeptide is less than 20 amino acids long and/or has less than 95%identity to the amino acid sequence as depicted in SEQ ID NO: 7,calculated over the entire length of the amino acid sequence as depictedin SEQ ID NO: 7; and portion E of the synthetic polypeptide is absent, aprotein tag, and/or has less than 95% identity to the amino acidsequence as depicted in SEQ ID NO: 8, calculated over the entire lengthof the amino acid sequence as depicted in SEQ ID NO: 8.

Portion A of the synthetic polypeptide, corresponding to the N-terminalportion of the polypeptide and the area adjacent to the amino end of theMBD amino acid sequence when the MBD amino acid sequence is N-terminalto the NID amino acid sequence, may be less than 40 amino acids,preferably less than 35, 30, 25, or 20 amino acids. It is particularlypreferred that portion A is less than 25 amino acids. Additionally oralternatively, portion A may have less than 95% identity to the aminoacid sequences as depicted in SEQ ID NOs:5 and 6, calculated over theentire length of the amino acid sequences as depicted in SEQ ID NOs: 5and 6; preferably the identity is less than 90%, less than 85%, lessthan 80%, less than 75%, less than 70%, less than 65%, less than 60%, orless than 50%. Preferably portion A is truncated compared to the naturalsequences of MeCP2 e1 and e2, such that portion A is less than 72 aminoacids long, preferably less than 70, 65, 50, 55, 45, 30, or 25 aminoacids. Further preferably, portion A will be truncated to such an extentthat SEQ ID NOs: 5 and 6 are essentially not present. For example, SEQID NOs: 5 and 6 may have been deleted from the amino acid sequence ofthe synthetic polypeptide, and optionally replaced with an N-terminaltag.

Thus the amino acid sequences specific to e1 and e2, which are 24 aminoacids long in human e1 (29 amino acids long in mouse e1) and 9 aminoacids long in e2 (human/mouse) and which are encoded by exons 1 and 2and the first 10 base pairs of exon 3 of MECP2 and Mecp2, may beincluded in the synthetic polypeptides of the invention, and so, forexample, may provide the most N-terminal amino acid sequence of thesynthetic polypeptide. The mouse e1 specific N-terminal amino acidsequence is MAAAAATAAAAAAPSGGGGGGEEERLEEK (SEQ ID NO: 9). The mouse e2specific N-terminal amino acid sequence is MVAGMLGLREEK (SEQ ID NO: 10).The human e1 specific N-terminal amino acid sequence isMAAAAAAAPSGGGGGGEEERLEEK (SEQ ID NO: 11). The human e2 specificN-terminal amino acid sequence is MVAGMLGLREEK (SEQ ID NO: 12).Therefore a synthetic polypeptide of the invention may comprise an aminoacid sequence corresponding to any of SEQ ID NOs 9-12, and optionallysaid amino acid sequence may be the most N-terminal sequence in thesynthetic polypeptide of the invention.

Preferably however, these extreme N-terminal sequences specific to thewild type e1 and e2 MeCP2 will not be included in the syntheticpolypeptides of the invention. Therefore preferably a syntheticpolypeptide of the invention will not comprise an amino acid sequencecorresponding to any of SEQ ID NOs 9-12.

Preferably the amino acid sequence adjacent to the amino end of the MBDamino acid sequence has less than 75% identity to the amino acidsequences as depicted in SEQ ID NOs: 5 and 6, calculated over the entirelength of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6preferably less than 50%, and further preferably less than 30% identity.

Preferably the amino acid sequence adjacent to the amino end of the MBDamino acid sequence is less than 50 amino acids long, preferably lessthan 30 amino acids long or less than 20 amino acids long, and furtherpreferably less than 10 amino acids long. Portion C of the syntheticpolypeptide, corresponding to the amino acid sequence between the MBDand NID amino acid sequences, may be less than 20 amino acids,preferably less than 15, 10, or 5 amino acids. Additionally oralternatively, portion C may have less than 95% identity to the aminoacid sequence as depicted in SEQ ID NO: 7; preferably the identity isless than 90%, less than 85%, less than 80%, less than 75%, less than70%, less than 65%, less than 60%, less than 50%, or less than 30%.Preferably portion C is truncated compared to the natural sequence ofMeCP2, such that portion C is less than 98 amino acids long, preferablyless than 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, or15 amino acids long. Preferably the amino acid sequence between the MBDand NID amino acid sequences is less than 50 amino acids long,preferably less than 30 amino acids long, and further preferably lessthan 20 amino acids long.

An additional benefit to deletions within portion C of the syntheticpolypeptide, i.e. the amino acid sequence between the MBD and NIDsequences, is that the inventors have found that deletions in thisportion can make the synthetic polypeptide less stable. Surprisingly,the inventors have found that such reduced stability may be beneficialto the utility of the synthetic polypeptide in the clinical setting,because it may reduce the chance of over-expression of the syntheticpolypeptide. Therefore in some embodiments it is particularly preferredthat there be a significant deletion in the amino acid sequence betweenthe MBD and NID amino acid sequences, e.g. portion C of the above-notedgeneric structure, for example a deletion of at least 10, 15, 20, 30,40, or 50 amino acids. Preferably the substitution or significantdeletion of the amino acids will include substitution or significantdeletion from the region from position 207 to position 271 of the fulllength human wild type MeCP2 polypeptide sequence (e2 isoform) as shownin SEQ ID NO: 4.

Portion E of the synthetic polypeptide, corresponding to the C-terminalportion of the polypeptide and the area adjacent to the carboxy end ofthe NID amino acid sequence when the MBD amino acid sequence isN-terminal to the NID amino acid sequence, may be absent, such that thecarboxy end of the NID amino acid sequence corresponds with theC-terminus of the synthetic polypeptide. Alternatively, portion E maycomprise a protein tag, for example so that portion E may be used toisolate or monitor/detect the synthetic polypeptide.

Additionally or alternatively, portion E may have less than 95% identityto the amino acid sequence provided in SEQ ID NO: 8, calculated over theentire length of the amino acid sequence as depicted in SEQ ID NO: 8;preferably the identity is less than 90%, less than 85%, less than 80%,less than 75%, less than 70%, less than 65%, less than 60%, or less than50%. Thus portion E may comprise a deletion of all or a significant partof the amino acids at positions 313 to 486 of MeCP2 (SEQ ID NO:4),optionally with replacement of the deleted sequence by a tag, andfurther optionally with a linker attaching the tag to the syntheticpolypeptide.

Preferably the amino acid sequence adjacent to the carboxy end of theNID amino acid sequence has less than 75% identity to the amino acidsequence as depicted in SEQ ID NO: 8, calculated over the entire lengthof the amino acid sequence as depicted in SEQ ID NO: 8, preferably lessthan 50%, and further preferably less than 30% identity.

Preferably the amino acid sequence adjacent to the carboxy end of theNID amino acid sequence is less than 50 amino acids long, preferablyless than 30 amino acids long or less than 20 amino acids long, andfurther preferably there is no amino acid sequence adjacent to thecarboxy end of the NID amino acid sequence such that the carboxy end ofthe NID amino acid sequence corresponds with the C-terminus of thesynthetic polypeptide.

The tag forming part of portion E of the synthetic polypeptides of theinvention may be for monitoring or detection of the syntheticpolypeptide or to allow post-translational regulation of the syntheticpolypeptide, as explained further below. Examples of suitable tags fordetection or monitoring of the polypeptide are known in the art, andinclude a polyhistidine tag, a FLAG tag, a Myc tag and a fluorescentprotein tag such as enhanced green fluorescent protein (EGFP).

Preferably the synthetic polypeptides of the invention have less than90% identity over the entire length of the amino acid sequences of MeCP2as depicted in SEQ ID NO: 3 and SEQ

ID NO: 4, preferably less than 80% identity, less than 70% identity, orless than 60% identity, and further preferably less than 40% identity.

The term “identity” refers to the extent to which two amino acidsequences have the same residues at the same positions in an alignment.The percentage identity as used herein is calculated across the lengthof a comparative sequence disclosed herein, for example one of SEQ IDNOs: 3 to 8, as described herein. Thus all residues in that comparativesequence should be aligned with the sequence of interest, and any gapscreated during alignment of the sequence of interest with the fulllength of the comparative sequence should be taken into account whencalculating the percentage identity, including when such “gaps” occur ateither end of the sequence of interest in the alignment. However, anyadditional end sequence in the sequence of interest that aligns past theend of the comparative sequence, i.e. which does not align as such withthe comparative sequence but which overhangs the end of the comparativesequence instead, should not be included when calculating the percentageidentity. Thus when calculating the percentage identity, the identityscore will be divided by the length of the comparative sequence,including any gaps that have been inserted into the comparative sequenceas a result of the optimal alignment with the sequence of interest, andthen multiplied by 100.

Methods of alignment of sequences for comparison are well known in theart. Various programs and alignment algorithms are described in: Smith &Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol.Biol. 48:443, 1970; Pearson & Lipman, Proc. Nat. Acad Sci. USA 85:2444,1988; Higgins & Sharp, Gene, 73:23744, 1988; Higgins & Sharp, CABIOS5:151-3, 1989; Corpet et al., Nuc. Acids Res. 16:10881-90, 1988; Huanget al. Computer Appls. in the Biosciences 8, 155-65, 1992; and Pearsonet al., Meth Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol.215:403-10, 1990, presents a detailed consideration of sequencealignment methods and homology calculations. There are readily availableprograms that permit the preparation of sequence alignments and thecalculation of percentage identity, such as the GAP program, runningunder GCG (Genetics Computer Group Inc., Madison, Wis., USA).

Other variants of the synthetic polypeptides of the invention can be,for example, functional variants such salts, amides, esters, andspecifically C-terminal esters, and N-acyl derivatives. Also includedare peptides which are modified in vivo or in vitro, for example byglycosylation, amidation, carboxylation or phosphorylation.

A benefit of the invention is that by identifying the MBD and NIDsequences as being key to the activity of MeCP2 that is deficient inRett syndrome, the inventors have identified that significant portionsof other sections of the MeCP2 protein may be removed when preparing asynthetic protein for use in treating or preventing a disorder such asRett syndrome. Therefore the invention provides synthetic polypeptidesthat are truncated forms of MeCP2, comprising MBD and NID sequences butmissing one or more sections of the amino acid sequences adjacent to theamino end of the MBD amino acid sequence (e.g. in portion A), betweenthe MBD and NID amino acid sequences (e.g. in portion C), and adjacentto the carboxy end of the NID amino acid sequence (e.g. in portion E),when compared to the full length human MeCP2 e1 and e2 sequences (SEQ IDNOs 3 and 4). Preferably a synthetic polypeptide of the invention willhave less amino acids than the wild type MeCP2 protein, for example asynthetic polypeptide of the invention may consist of less than 430amino acids, preferably less than 400, 350, 320, 270, or 200 aminoacids, and further preferably less than 180 amino acids.

Synthetic polypeptides of the invention may suitably comprise a signalpeptide, for example a nuclear localisation signal. A nuclearlocalisation signal may target a polypeptide for import into the cellnucleus by nuclear transport. Suitable nuclear localisation signals areknown in the art, and include the SV40 nuclear localisation signal andthe NLS of the native MeCP2 protein (amino acids 253 to 271 of SEQ IDNO: 4). The NLS may be situated in any part of the polypeptide, butpreferably will be situated in the amino acid sequence linking the MBDto the N ID.

Synthetic polypeptides of the invention may suitably comprise acell-penetrating peptide (CPP). CPPs, also called protein-transductiondomains, consist of short sequences of 8 to 30 amino acids in lengththat can facilitate entry of molecules into cells¹⁵. The CPP may besynthetic, designed specifically for the targeting of therapeuticmolecules, or naturally occurring. Preferably the CPP will facilitateentry into neuronal cells. A preferred CPP for use with the syntheticpolypeptides of the invention is the CPP of the trans-activator oftranscription protein, Tat.

Synthetic polypeptides of the invention may suitably comprise a tag.Such tags are well known in the art and may be useful for polypeptidepurification, detection/monitoring and/or post-translational regulation.Examples of suitable tags useful in purification or detection of apolypeptide are known in the art, and include a polyhistidine tag, aFLAG tag, a Myc tag and a fluorescent protein tag such as enhanced greenfluorescent protein (EGFP).

A tag may be used for post-translational regulation of a polypeptide by,for example, providing the ability to control the post-translationaldegradation of the polypeptide. Examples of suitable tags that may beused with synthetic polypeptides of the invention to allow such controlof post-translational degradation include a SMASh tag and aDestabilisation Domain (DD) of FKBP12. The SMASh tag is approximately300 amino acids long and comprises a protease cleavage site followed bythe protease, followed by a degron tag. The SMASh tag is fused in theprotein of interest (POI) with the protease cleavage site and proteasebetween the POI and the degron tag. This can be on either terminus ofthe POI. Normally, the protease self-cleaves the protease site, removingthe degron tag so that the protein is not degraded. However, treatmentwith a drug such as Asunaprevir can inhibit the protease, which preventsremoval of the degron tag, and so results in degradation of the attachedPOI. Since the administration of Asunaprevir has a dose-dependent effecton POI degradation, the use of a SMASh tag with Asunaprevir can allowpost-translational regulation of the amount of the POI.

Similarly, the DD-FKBP12 is approximately 110 amino acids long and itdestablises the POI to which it is attached. It can be fused to eitherterminus of the POI, but it is preferably attached to the N-terminus asthen it is generally more effective. The fusion protein produced willtherefore generally be unstable but it can be protected by theadministration of a molecule called Shield-1. Administration of Shield-1has a dose-dependent effect on the prevention of the degradation of thePOI.

The skilled person will appreciate that it may not be desirable toinclude certain types of tags, particularly those used for purificationor detection during polypeptide synthesis such as Myc or EGFP, in thefinal therapeutic polypeptide that is delivered to a subject, as thetags may be immunogenic or active in an undesirable manner. Thereforethe tags included in the synthetic polypeptides of the invention may beremovable, for example by chemical agents or enzymatic means, such asproteolysis or intein splicing; this may allow the use of the tag duringpreparation of a synthetic polypeptide followed by removal of the tagbefore the polypeptide is introduced to an animal for treatments inaccordance with therapeutic uses (e.g. protein replacement therapy)disclosed herein.

The synthetic polypeptides of the invention may comprise a linkersequence, for example to link the MBD sequence to the NID sequence, inplace of the natural sequence that links these two sequences in MeCP2,or as a means to attach or insert a tag, CPP or NLS to a syntheticpolypeptide of the invention. The design and use of such linkers arewell known in the art. A suitable linker may comprise between 4 and 15amino acids, preferably between 6 and 10 amino acids. Preferably thelinker will consist of glycines, serines, or a combination of glycinesand serines.

Preferably the synthetic polypeptide sequences of the invention willshow at least 80% similarity to one or more of the sequences ΔNC (SEQ IDNO: 13), ΔNIC (SEQ ID NO: 14), ΔN mouse (SEQ ID NO: 15), ΔNC mouse (SEQID NO: 16), and ΔNIC mouse (SEQ ID NO: 17), preferably at least 85%,90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% similarity.Optionally, as disclosed above, the synthetic polypeptide sequences ofthe invention may comprise an e1 or e2 specific sequence, preferably atthe N-terminus of the synthetic polypeptide. Thus preferably thesynthetic polypeptide sequences of the invention will show at least 80%similarity to a sequence consisting of one or more of the sequences: ΔNC(SEQ ID NO: 13); ΔNIC (SEQ ID NO: 14); ΔN mouse (SEQ ID NO: 15); ΔNCmouse (SEQ ID NO: 16); and ΔNIC mouse (SEQ ID NO: 17), and one of the e1or e2 specific sequences (SEQ ID NOs: 9-12) at the N-terminus of thesequence, and preferably at least 85%, 90%, 91%, 92%, 93%, 94%, 95%,96%, 97%, 98%, or 99% similarity. Further preferably the syntheticpolypeptide sequences of the invention may comprise or consist of thesequence ΔNC (SEQ ID NO: 13), ΔNIC (SEQ ID NO: 14), ΔN mouse (SEQ ID NO:15), ΔNC mouse (SEQ ID NO: 16), ΔNIC mouse (SEQ ID NO: 17), or one ofthe sequences ΔNC (SEQ ID NO: 13), ΔNIC (SEQ ID NO: 14), ΔN mouse (SEQID NO: 15), ΔNC mouse (SEQ ID NO: 16), ΔNIC mouse (SEQ ID NO: 17), withone of the e1 or e2 specific sequences (SEQ ID NOs: 9-12) at theN-terminus of the sequence. ΔNC (SEQ ID NO: 13), ΔN mouse (SEQ ID NO:15), and ΔNC mouse (SEQ ID NO: 16) include the wild type MeCP2 NLSsequence. ΔNIC (SEQ ID NO: 14) and ΔNIC mouse (SEQ ID NO: 17) includethe SV40 NLS sequence.

It is particularly preferred that the synthetic polypeptide sequences ofthe invention will show at least 80% similarity to a sequence consistingof ΔNIC (SEQ ID NO: 14) and a human e1 or e2 specific sequence (SEQ IDNOs: 11 and 12) immediately adjacent the N-terminus of ΔNIC (SEQ ID NO:14), preferably at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, or 99% similarity. Further preferably the synthetic polypeptidesequences of the invention may comprise or consist of ΔNIC (SEQ ID NO:14), optionally with a human e1 or e2 specific sequence (SEQ ID NOs: 11and 12) immediately adjacent the N-terminus of ΔNIC (SEQ ID NO: 14).Synthetic polypeptides of the invention may be used in the treatment orprevention of a neurological disorder associated with inactivatingmutation of MECP2, for example Rett syndrome, as explained furtherbelow.

Nucleic Acid Constructs, Expression Vectors, Virions and Cells

The invention provides nucleic acid constructs encoding a syntheticpolypeptide of the invention, and expression vectors comprising anucleotide sequence encoding a synthetic polypeptide of the invention.The expression vector may be a viral vector, and thus the invention alsoprovides a virion comprising an expression vector according to theinvention. The invention also provides cells that comprise a syntheticgenetic construct adapted to express a polypeptide of the invention,cells comprising a vector of the invention, and cells for producing avirion of the invention.

Nucleic acid constructs and/or expression vectors suitably comprise atleast one expression control sequence operably linked to a nucleotidesequence encoding a synthetic polypeptide of the invention, to driveexpression of the synthetic polypeptide. “Expression control sequences”are nucleotide sequences located upstream (5′ non-coding sequences),within, or downstream (3′ non-coding sequences) of a coding sequence,and which influence the transcription, RNA processing or stability, ortranslation of the associated coding sequence. Expression controlsequences include enhancers, promoters, translation leader sequences,introns, and polyadenylation signal sequences. They include natural andsynthetic sequences as well as sequences that may be a combination ofsynthetic and natural sequences. As is noted herein, the term“expression control sequences” is not limited to promoters. However,some suitable expression control sequences useful in the presentinvention will include, but are not limited to constitutive promoters,tissue-specific promoters, development-specific promoters, regulatablepromoters and viral promoters.

Such expression control sequences generally comprise a promoter sequenceand additional sequences which regulate transcription and translationand/or enhance expression levels. Suitable expression control sequencesare well known in the art and include eukaryotic, prokaryotic, or viralpromoter or poly-A signal. Expression control and other sequences will,of course, vary depending on the host cell selected or can be madeinducible. Examples of useful promoters are the SV-40 promoter (Science1983, 222: 524-527), the metallothionein promoter (Nature 1982, 296:39-42), the heat shock promoter (Voellmy et al., P.N.A.S. USA 1985,82:4949-4953), the PRV gX promoter (Mettenleiter and Rauh, J. Virol.Methods 1990, 30: 55-66), the human CMV IE promoter (U.S. Pat. No.5,168,062), the Rous Sarcoma virus LTR promoter (Gorman et al., P.N.A.S.USA 1982, 79: 6777-6781), or human elongation factor 1 alpha orubiquitin promoter. Suitable control sequences to drive expression inanimals, e.g. humans, are well known in the art. The expression controlsequences can drive ubiquitous expression or tissue- or cell-specificexpression. The expression control sequence can comprise, for example, aviral or human promoter. A suitable promoter can be ubiquitous (e.g. theCAG promoter), tissue restricted or tissue specific. For example, theNEST/N promoter may drive expression in the CNS and the TAU and SYNAPSINpromoters may drive expression in neurons. Preferably the promoter willbe for expression of the nucleotide sequence in neuronal cells, forexample the MECP2 or Mecp2 promoter. Many suitable control sequences areknown in the art, and it would be routine for the skilled person toselect suitable sequences for the expression system being used.

The expression of the MECP2 is tightly controlled in animals. Thereforewhere nucleic acid constructs and expression vectors of the inventionare to be used for expression of the synthetic polypeptides of theinvention in an animal, for example in gene therapy, it is preferredthat said expression be specific to neural cells, particularly neuralcells of the brain and CNS. This may be accomplished, for example,through specific targeting of the nucleic acid constructs and expressionvectors to the brain and/or the neural cells, through the use ofdelivery vehicles, such as AAV virions, and/or specific administrationroutes, such as by administration directly to the CNS. Additionally oralternatively, said neuron specific expression may be accomplished bythe use of expression control sequences in the nucleic acid constructsand/or expression vectors that substantially limit the expression of thesynthetic polypeptide to neural cells. Preferably those expressioncontrol sequences will be selected from the expression control sequencesused to control the natural expression of the MECP2 gene.

The MECP2 gene contains a remarkably large, highly conserved 3′UTR, inwhich enhancers, silencers and many miRNA binding sites have beenidentified. Similarly, the MECP2 gene promoter region is also verylarge, and includes silencer, regulatory element and promoter sequences.Therefore preferably the expression vectors and/or nucleic acidconstructs of the invention will include one or more of the expressioncontrol sequence elements from the MECP2 3′UTR. For example, genetherapy is disclosed herein that used an expression cassette thatincluded an upstream core promoter element from the Mecp2 gene, anddownstream microRNA (miR) binding sites and an AU-rich element.Therefore an expression vector and/or nucleic acid construct of theinvention will preferably comprise one or more elements selected from:an upstream MECP2 or Mecp2 core promoter sequence (see, for example,nucleotides 200 to 329, inclusive, of SEQ ID NO:65); one or moredownstream miR binding sites from the MECP2 or Mecp2 3′UTR; and anAU-rich element from the MECP2 or Mecp2 3′UTR. Further preferably theone or more downstream miR binding sites from the MECP2 or Mecp2 3′UTRwill comprise one or more, or all of the following miR binding sites: abinding site for mir-22 (nucleotides 1166 to 1195, inclusive, of SEQ IDNO:65), a binding site for mir-19 (nucleotides 1196 to 1224, inclusive,of SEQ ID NO:65), a binding site for mir-132 (nucleotides 1225 to 1252,inclusive, of SEQ ID NO:65), and a binding site for mir-124 (nucleotides1318 to 1324, inclusive, of SEQ ID NO:65).

Preferably, the expression vector and/or nucleic acid construct of theinvention may comprise an upstream CNS regulatory element from Mecp2 orMECP2 (see, for example, nucleotides 422 to 443, inclusive, of SEQ IDNO:65) and/or an upstream silencer from Mecp2 or MECP2 (see, forexample, nucleotides 142 to 203, inclusive, of SEQ ID NO:65).

It is particularly preferred that the upstream region of the expressionvector and/or nucleic acid construct of the invention will comprise themMeP426 sequence (nucleotides 117 to 542 of SEQ ID NO: 65; see FIGS.17B, C) and/or that the downstream region of the expression vectorand/or nucleic acid construct of the invention will comprise the RDH1pAsequence (nucleotides 1166 to 1370 of SEQ ID NO: 65; see FIGS. 17B, C).

Of course, the skilled person will appreciate that one or more othersequence elements may also be desirable or required in an expressionvector and/or nucleic acid construct of the invention, such as atranslational initiation signal, e.g. a Kozak sequence, apolyadenylation signal, and binding sites for components of thepolyadenylation machinery such as CstF (cleavage stimulation factor).The skilled person will be capable of designing an expression vectorand/or nucleic acid construct in accordance with the invention havingany and all such necessary or desirable well known sequences.

The human wild type MECP2 e1 isoform cDNA sequence is provided herein asSEQ ID NO: 18. The mouse wild type MECP2 e1 isoform cDNA sequence isprovided herein as SEQ ID NO: 21.

Suitably the nucleic acid construct of the invention may comprise asequence for encoding the cDNA sequence of ΔNC (SEQ ID NO: 19), ΔNIC(SEQ ID NO: 20), ΔN mouse (SEQ ID NO: 22), ΔNC mouse (SEQ ID NO: 23), orΔNIC mouse (SEQ ID NO: 24). As explained above, optionally the syntheticpolypeptides of the invention may comprise the e1 or e2 specificN-terminal sequences. The mouse e1 specific N-terminal amino acidsequence is encoded by the cDNA sequenceATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAGGAAAAG (SEQ ID NO: 25). The mouse e2 specificN-terminal amino acid sequence is encoded by the cDNA sequenceATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGGGAGGAAAAG (SEQ ID NO: 26). Thehuman e1 specific N-terminal amino acid sequence is encoded by the cDNAsequence ATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAAGAAAAG (SEQ ID NO: 27). The human e2 specific N-terminalamino acid sequence is encoded by the cDNA sequenceATGGTAGCTGGGATGTTAGGGCTCAGGGAAGAAAAG (SEQ ID NO: 28). Therefore thenucleic acid construct of the invention may comprise a sequence forencoding the cDNA sequence according to any of SEQ ID NOs: 25-28.

Further preferably the nucleic acid construct of the invention maycomprise a sequence for encoding the cDNA sequence of SEQ ID NO: 28 or29 immediately adjacent to the cDNA sequence of ΔNIC (SEQ ID NO: 20).

Due to the degeneracy of the genetic code, polynucleotides encoding anidentical or substantially identical amino acid sequence may utilisedifferent specific codons (e.g. synonymous base substitutions). Allpolynucleotides encoding the synthetic polypeptides as defined above areconsidered to be part of the invention.

The invention provides an expression vector comprising a nucleotidesequence encoding a synthetic polypeptide of the invention. Such vectorssuitably comprise an isolated or synthetic nucleic acid construct asdescribed above.

The vectors according to the invention are suitable for transforming ahost cell. Examples of suitable cloning vectors are plasmid vectors suchas pBR322, the various pUC, pEMBL and Bluescript plasmids, or viralvectors such as HVT (Herpes Virus of Turkeys), MDV (Marek DiseaseVirus), ILT (Infectious Laryngotracheitis Virus), FAV (Fowl Adenovirus),FPV (FowlpoxVirus), or NDV (Newcastle Disease Virus). pcDNA3.1 is aparticularly preferred vector for expression in animal cells.

After the polynucleotide has been cloned into an appropriate vector, theconstruct may be transferred into the cell, bacteria, or yeast by meansof an appropriate method, such as electroporation, CaCl2 transfection orlipofectins. When a baculovirus expression system is used, the transfervector containing the polynucleotide may be transfected together with acomplete baculo genome.

These techniques are well known in the art and the manufacturers ofmolecular biological materials (such as Clontech, Stratagene, Promega,and/or Invitrogen) provide suitable reagents and instructions on how touse them. Furthermore, there are a number of standard reference textbooks providing further information on this, e.g. Rodriguez, R. L. andD. T. Denhardt, ed., “Vectors: A survey of molecular cloning vectors andtheir uses”, Butterworths, 1988; Current protocols in Molecular Biology,eds.: F. M. Ausubelet al., Wiley N. Y. , 1995; Molecular Cloning: alaboratory manual, supra; and DNA Cloning, Vol. 1-4, 2nd edition 1995,eds.: Glover and Hames, Oxford University Press).

Details of preferred proteins according to the present invention forexpression via the vector are described above.

The vector may be adapted to provide transient expression in a host cellor stable expression. Stable expression can be achieved, for example,through integration of the nucleotide sequence encoding the syntheticpolypeptide into the genome of the host cell.

Suitable viral vectors include retroviral vectors (including lentiviralvectors), adenoviral vectors, adeno-associated viral (AAV) vectors, andalphaviral vectors. Preferably the viral vector will be an AAV vector,such as AAV1, AAV2, AAV4, AAV5, AAV6, AAV8 or AAV9. Preferably the AAVvector will be a self-complementary (sc) AAV vector.

The vector of the present invention may be present in a virion. Thus thepresent invention also provides a virion comprising a vector inaccordance with the present invention. Preferably the virion and/orviral vector will be for expression in cells of the central nervoussystem (CNS), such as neuronal cells. Thus preferably a virion of theinvention will comprise a capsid and/or inverted terminal repeats (ITRs)from one or more of AAV1, AAV2, AAV4, AAV5, AAV6, AAV8, and AAV9.Preferably the AAV will be a self-complementary (sc) AAV vector. Furtherpreferably, the ITR and capsid proteins may be from different serotypes,for example ITRs from AAV2 may be used with capsid proteins from AAV9 toform scAAV virions.

Vectors according to the present invention can be used in transformingcells for expression of a protein according to the present invention.This can be done in cell culture to produce recombinant protein forharvesting, or it can be done in vivo to deliver a protein according tothe present invention to an animal.

Thus the present invention also provides a cell population in whichcells comprise a synthetic genetic construct adapted to express aprotein according to the present invention. Said cell population may bepresent in a cell-culture system in a suitable medium to support cellgrowth.

The cells can be eukaryotic or prokaryotic.

Polynucleotides of the present invention may be cloned into anyappropriate expression system. Suitable expression systems includebacterial expression system (e.g. Escherichia coli DH5α), a viralexpression system (e.g. Baculovirus), a yeast system (e.g. Saccharomycescerevisiae) or eukaryotic cells (e.g. COS-7, CHO,BHK, HeLa, HD11, DT40,CEF, or HEK-293T cells). A wide range of suitable expression systems areavailable commercially. Typically the polynucleotide is cloned into anappropriate vector under control of a suitable constitutive or induciblepromoter and then introduced into the host cell for expression.

Suitably the cells are animal cells, more preferably they are mammaliancells, and most preferably human cells. Suitably the cells comprise avector as set out above.

Preferably the cells are adapted such that expression of the proteinaccording to the present invention is inducible.

It is particularly preferred that the cells comprise an expressionvector for expressing a synthetic polypeptide of the invention, and thecell is suitable or adapted for producing a virion comprising anexpression vector of the invention. Thus the cells may be used toproduce virions for use in gene therapy treatment of Rett syndrome andrelated disorders. Preferably the virions will comprise AAV vectors forexpressing a polypeptide of the invention, and further preferably theAAV vectors will comprise AAV9 and/or AAV2.

Suitable host cells for producing AAV virions include microorganisms,yeast cells, insect cells, and mammalian cells, that can be, or havebeen, used as recipients of a heterologous DNA molecule. The termincludes the progeny of the original cell which has been transfected.Thus, a “host cell” as used herein generally refers to a cell which hasbeen transfected with an exogenous DNA sequence. Cells from the stablehuman cell line, 293 (readily available through, e.g., the American TypeCulture Collection under Accession Number ATCC CRL1573) can be used inthe practice of the present invention. Particularly, the human cell line293 is a human embryonic kidney cell line that has been transformed withadenovirus type-5 DNA fragments, and expresses the adenoviral Ela andElb genes. The 293 cell line is readily transfected, and provides aparticularly convenient platform in which to produce AAV virions.

Suitably, for in vivo delivery, virions of the invention, such as AAVvirions, may be formulated into pharmaceutical compositions. Suitably,pharmaceutical compositions will comprise sufficient genetic material toproduce a therapeutically effective amount of the synthetic polypeptideof the invention, i.e., an amount sufficient to reduce, ameliorate orprevent symptoms of the disorders associated with reduced MeCP2activity, such as Rett syndrome. The pharmaceutical compositions mayalso contain a pharmaceutically acceptable excipient. Such excipientsinclude any pharmaceutical agent that does not itself induce theproduction of antibodies harmful to the individual receiving thecomposition, and which may be administered without undue toxicity.Pharmaceutically acceptable excipients include, but are not limited to,sorbitol, Tween80, and liquids such as water, saline, glycerol andethanol. Pharmaceutically acceptable salts can be included therein, forexample, mineral acid salts such as hydrochlorides, hydrobromides,phosphates, sulfates, and the like; and the salts of organic acids suchas acetates, propionates, malonates, benzoates, and the like.Additionally, auxiliary substances, such as wetting or emulsifyingagents, pH buffering substances, and the like, may be present in suchvehicles.

As is apparent to those skilled in the art, an effective amount of viralvector which must be added can be empirically determined. Administrationcan be effected in one dose, continuously or intermittently throughoutthe course of treatment. Methods of determining the most effective meansand dosages of administration are well known to those of skill in theart and will vary with the viral vector, the composition of the therapy,and the subject being treated. Single and multiple administrations canbe carried out with the dose level and pattern being selected by thetreating physician.

Pharmaceutical Compositions, Methods of Prevention and Treatment, andUse in Same

The synthetic polypeptides of the invention are useful for replacingdefective MeCP2 in the cells of those affected by Rett syndrome orrelated disorders. Thus the invention provides a method of treating orpreventing disease in an animal comprising administering a syntheticpolypeptide of the invention. Preferably the disease is a neurologicaldisorder associated with inactivating mutation of MECP2, for exampleRett syndrome. Preferably the animal is a human patient.

The administering in the methods of the invention may compriseadministering a composition comprising a synthetic polypeptide of theinvention, administering an expression vector of the invention, and/oradministering a virion of the invention.

The invention therefore also provides synthetic polypeptides, expressionvectors and virions for the treatment or prevention of a neurologicaldisorder associated with inactivating mutation of MECP2, for exampleRett syndrome, and the use of a synthetic polypeptide of the invention,an expression vector of the invention, or a virion of the invention, inthe manufacture of a medicament for the treatment or prevention of aneurological disorder associated with inactivating mutation of MECP2,for example Rett syndrome.

The disorders that may be treated or prevented as provided hereininclude those involving a reduction or inactivation in the activity ofMeCP2. Thus, as used herein, the phrase “inactivating mutation”encompasses mutations that result in reduced MeCP2 activity as well asmutations that abolish MeCP2 activity, and in particular the activitydue to the ability of MeCP2 to recruit members of the NCoR/SMRTco-repressor complex to methylated DNA. Such disorders may berecognised, for example, by the identification of mutations in MeCP2 inthe subject having, or at risk of having, the disorder. For example,recurrent (e.g. A140V) or sporadic mutations in males have been found tobe causative in some cases of X-linked intellectual disability.Similarly, in females, “hypomorphic” mutations of this kind areassociated with learning disability, and exome sequencing of childrendiagnosed with developmental delay is also revealing mutations in MECP2.Such mutations may affect the ability of the MeCP2 to recruit componentsof the NCoR/SMRT co-repressor complex to methylated DNA, and/or maygenerally affect the stability of the protein.

In the context of the methods and medical uses of the present invention,the animal to be treated may be anyone requiring the treatment, oranyone deemed to be at risk of developing a relevant disorder. Suitablythe animal may be a mammal, preferably a primate and further preferablya human subject.

The animal to be treated may present with symptoms suggestive of a MeCP2associated disorder. Alternatively, the subject may appear to beasymptomatic but deemed to be at risk of developing an MECP2-relateddisorder caused by loss of MeCP2 function, such that preventativetreatment with synthetic polypeptides of the invention is desirable.Suitably an asymptomatic subject may be a subject who is believed to beat elevated risk of having a MeCP2-associated disorder. Such anasymptomatic subject may be one who has a family history of MeCP2, orone who has undergone genetic testing that indicates a mutation in theMECP2 gene.

The present invention envisions treating or preventing disordersassociated with reduced MeCP2 activity by the administration of atherapeutic agent, i.e., a synthetic polypeptide composition, a nucleicacid construct, an expression vector, and/or a virion of the invention.Administration of the therapeutic agents in accordance with the presentinvention may be continuous or intermittent, depending, for example,upon the recipient's physiological condition, whether the purpose of theadministration is therapeutic or prophylactic, and other factors knownto skilled practitioners. The administration of the agents of theinvention may be essentially continuous over a preselected period oftime or may be in a series of spaced doses. Both local and systemicadministration is contemplated.

One or more suitable unit dosage forms having the therapeutic agent(s)of the invention can be administered by a variety of routes includingparenteral, including by intravenous and intramuscular routes, as wellas by direct injection into the tissue directly associated with thereduced MeCP2 activity. For example, the therapeutic agent may bedirectly injected into the brain. Alternatively the therapeutic agentmay be introduced intrathecally for brain and spinal cord conditions. Inanother example, the therapeutic agent may be introduced intramuscularlyfor viruses that traffic back to affected neurons from muscle, such asAAV, lentivirus and adenovirus. The formulations may, where appropriate,be conveniently presented in discrete unit dosage forms and may beprepared by any of the methods well known to pharmacy. Such methods mayinclude the step of bringing into association the therapeutic agent withliquid carriers, solid matrices, semi-solid carriers, finely dividedsolid carriers or combinations thereof, and then, if necessary,introducing or shaping the product into the desired delivery system.

When the therapeutic agents of the invention are prepared foradministration, they are preferably combined with a pharmaceuticallyacceptable carrier, diluent or excipient to form a pharmaceuticalformulation, or unit dosage form. The total active ingredients in suchformulations include from 0.1 to 99.9% by weight of the formulation. A“pharmaceutically acceptable carrier” is a diluent, excipient, and/orsalt that is compatible with the other ingredients of the formulation,and not deleterious to the recipient thereof. The active ingredient foradministration may be present as a powder or as granules, as a solution,a suspension or an emulsion.

Pharmaceutical formulations containing the therapeutic agents of theinvention can be prepared by procedures known in the art using wellknown and readily available ingredients.

The therapeutic agents of the invention can also be formulated assolutions appropriate for parenteral administration, for instance byintramuscular, subcutaneous or intravenous routes.

The pharmaceutical formulations of the therapeutic agents of theinvention can also take the form of an aqueous or anhydrous solution ordispersion, or alternatively the form of an emulsion or suspension.

Thus, the therapeutic agent may be formulated for parenteraladministration (e.g., by injection, for example, bolus injection orcontinuous infusion) and may be presented in unit dose form in ampules,pre-filled syringes, small volume infusion containers or in multi-dosecontainers with an added preservative. The active ingredients may takesuch forms as suspensions, solutions, or emulsions in oily or aqueousvehicles, and may contain formulatory agents such as suspending,stabilizing and/or dispersing agents. Alternatively, the activeingredients may be in powder form, obtained by aseptic isolation ofsterile solid or by lyophilization from solution, for constitution witha suitable vehicle, e.g., sterile, pyrogen-free water, before use.

The pharmaceutical formulations of the present invention may include, asoptional ingredients, pharmaceutically acceptable carriers, diluents,solubilizing or emulsifying agents, and salts of the type that arewell-known in the art. Specific non-limiting examples of the carriersand/or diluents that are useful in the pharmaceutical formulations ofthe present invention include water and physiologically acceptablebuffered saline solutions such as phosphate buffered saline solutions pH7.0-8.0 saline solutions and water.

BRIEF DESCRIPTION OF THE FIGURES

The invention will now be described in detail with reference to aspecific embodiment and with reference to the accompanying drawings, inwhich:

FIG. 1 shows stepwise deletion of MeCP2 protein, retaining only the twokey functional domains predicted by mutational analysis.

A) Schematic of human MeCP2 protein sequence (e1 isoform) with themethyl-CpG binding domain (MBD) [residues 78-162¹²] and the NCoR/SMRTinteraction domain (NID) [residues 285-309⁴]; annotated with (above):polymorphisms in healthy hemizygous males, RTT-causing missensemutations, and (below) sequence identity to chimpanzee (e1), mouse (e1),Xenopus and zebrafish homologues [sites of insertions in Xenopus andzebrafish sequences shown as longer bars].

B) Schematic of the deletions series of MeCP2 proteins (mouse e2isoforms) expressed by the three novel mouse lines presented in thisstudy (and WT-EGFP mice¹⁶).

C) EGFP-tagged shortened proteins were overexpressed in HeLa cells, andimmunoprecipitated using GFP-TRAP beads. Western blots show expressionand purification of these protein (GFP), and co-immunoprecipiation ofNCoR/SMRT co-repressor complex components (NCoR, HDAC3 and TBL1XR1).WT-EGFP and R306C-EGFP were used as controls to show the presence andabsence of binding to these proteins, respectively. ‘In’=input,‘IP’=immunoprecipiate.

D) EGFP-tagged shortened MeCP2 proteins were overexpressed in NIH-3T3cells, which were PFA fixed and stained with DAPI. WT-EGFP andR111G-EGFP were used as controls to show focal and diffuse localisation,respectively.

E) EGFP-tagged shortened proteins were co-overexpressed withTBL1X-mCherry in NIH-3T3 cells, which were PFA fixed and stained withDAPI. WT-EGFP and R306C-EGFP were used as controls to show the presenceand absence TBL1X-mCherry recruitment to heterochromatic foci,respectively.

FIG. 2 shows the design of the MeCP2 deletion series.

A) Schematic of the genomic DNA sequences of wild-type and ΔNIC MeCP2,showing the retention of the extreme N-terminal amino acids encoded inexons 1 and 2 and the first 10 bp of exon 3, the deletion of the N- andC-termini, the replacement of the intervening region with a linker andSV40 NLS, and the addition of the C-terminal EGFP tag. Colour key:5′UTR=white, MBD=mid-grey, NID=dark grey, uncharacterised regions=grey,SV40 NLS=mid-grey beside linker, linkers=dark grey and EGFP=C-terminalmid-grey.

B) The N-terminal ends of the sequences of all three shortened proteins(e1 and e2 isoforms) showing the fusion of the extreme N-terminal aminoacids to the MBD (starting with Pro72).

C), D) Protein sequence alignment of the (C) MBD and (D) NID regionusing ClustalWS, shaded according to BLOSUM62 score. Both alignments areannotated with: (above) RTT-missense mutations¹³ and activity-dependentphosphorylation sites^(17,18,19;) and (below) sequence conservation,interaction domains and known²⁰/predicted²¹ structure. Interactionsites: meDNA binding (residues 78-162¹²), AT hook 1 (residues183-195²²), AT hook 2 (residues 257-272²³), NCoR/SMRT binding (residues285-309⁴). The bipartite nuclear localisation signal (NLS) is also shown(residues 253-256 and 266-271). Residue numbers correspond to that ofmammalian e2 isoforms. The regions retained in ΔNIC are: MBD resides72-173 (highlighted by the grey rectangle in C) and NID resides 272-312(highlighted by the grey rectangle in D).

FIG. 3 shows the constructs for the generation of ΔN and ΔNC mice.

(Upper) Diagram of (A) ΔN and (B) ΔNC mouse production. The endogenousMecp2 allele was targeted in male ES cells. The selection cassette wasremoved in vivo by crossing chimaeras with deleter (CMV-Cre) mice.

(Lower) Southern blot analysis shows correct targeting of ES cells andsuccessful cassette deletion in the knock-in mice.

FIG. 4 shows constructs for the generation of ΔNIC and STOP mice.

(Upper) Diagram of ΔNIC mouse production. The endogenous Mecp2 allelewas targeted in male ES cells. The selection cassette was removed invivo by crossing chimaeras with deleter (CMV-Cre) mice to produceconstitutively expressing ΔNIC mice or retained to produce STOP mice.

(Lower) Southern blot analysis shows correct targeting of ES cells andsuccessful cassette deletion in the ΔNIC knock-in mice.

FIG. 5 shows that ΔN and ΔNC proteins are expressed at around wild-typelevels in knock-in mice.

A, Upper) Western blot analysis of crude whole brain extract showingprotein sizes and levels in ΔN mice (n=3) compared to their wild-typelittermates (n=3), detected using a C-terminal MeCP2 antibody.

A, Lower) Western blot analysis of ΔN (n=3) and ΔNC (n=3) mice comparedto WT-EGFP controls (n=3), detected using a GFP antibody. Histone H3 wasused as a loading control. *denotes a non-specific band detected by theGFP antibody.

B) Flow cytometry analysis of protein levels in nuclei prepared fromwhole brain (‘All’) and the high-NeuN subpopulation (‘Neurons’) inWT-EGFP (n=3), ΔN (n=3) and ΔNC (n=3) mice, detected using EGFPfluorescence. Graph shows mean±S.E.M and genotypes were compared toWT-EGFP controls by t-test: All ΔN p=0.338, ΔNC **p=0.003; and NeuronsΔN p=0.672, ΔNC *p=0.014.

FIG. 6 shows deletion of the N- and C-termini has minimal phenotypicconsequence.

A), B) Phenotypic scoring of hemizygous male (A) ΔN mice (n=10) and (B)ΔNC mice (n=10) each compared to their wild-type littermates (n=10) overone year. Graphs show mean scores±S.E.M. Mecp2-null data (n=12)¹⁶ isused as a comparator.

C),D) Kaplan-Meier plots showing survival of the same cohorts in parts Aand B. Mecp2-null data (n=24)¹⁶ is used as a comparator.

E), F), G) Behavioural analysis of separate cohorts performed at 20weeks of age: ΔN (n=10) and ΔNC mice (n=10-11) each compared to theirwildtype littermates (n=10). All graphs show individual values andmedians, and the results of statistical analysis comparing genotypes(see below): not significant (‘n.s.’) p>0.05, *p<0.05.

E) Time spent in the closed and open arms of the Elevated Plus Maze wasmeasured during a 15 minute trial, and genotypes were compared using KStests: ΔN cohort (left) closed arms p=0.988 and open arms p=0.759; ΔNCcohort (right) closed arms p=0.956 and open arms p=0.932.

F) Time spent in the centre region of the Open Field test was measuredduring a 20 minute trial, and genotypes were compared using t-tests: ΔNp=0.822; ΔNC *p=0.020.

G) Average latency to fall from the Accelerating Rotarod in four trialswas calculated for each of the three days of the experiment, andgenotypes were compared using KS tests: ΔN cohort day 1 p=0.759, day 2p=0.401 and day 3 p=0.055; ΔNC cohort day 1 p=0.988, day 2 p=0.401 andday 3 p=0.759.

FIG. 7 shows that ΔNC have a slightly increased weight phenotype that isbackground-dependent.

A), B) Growth curves of the backcrossed scoring cohorts (see FIG. 6A-D).

C) Growth curve of an outbred (75% C57BL/6J) cohort of ΔNC mice (n=7)and wild-type littermates (n=9).

A), B), C) Graphs show mean values±S.E.M. Genotypes were compared usingrepeated measures ΔNOVA: ΔN p=0.385, ΔNC ****p<0.0001, ΔNC (outbred)p=0.739. Mecp2-null data (n=20)¹⁶ is used as a comparator.

FIG. 8 shows that no activity phenotype was detected for either ΔN orΔNC mice.

A), B) Behavioural analysis of ΔN (n=10) and ΔNC mice (n=10) eachcompared to their wildtype littermates (n=10) at 20 weeks of age (seeFIG. 6E-G). Total distance travelled the Open Field test was measuredduring a 20 minute trial. Graphs show individual values and medians, andgenotypes were compared using t-tests: ΔN p=0.691; ΔNC p=0.791.

FIG. 9 shows that additional deletion of the intervening region leads toprotein instability and mild RTT-like symptoms.

A) Western blot analysis of crude whole brain extract showing proteinsizes and levels in ΔNIC mice (n=3) and WT-EGFP controls (n=3), detectedusing a GFP antibody. Histone H3 was used as a loading control. *denotesa non-specific band detected by the GFP antibody.

B) Flow cytometry analysis of protein levels in nuclei prepared fromwhole brain (‘All’) and the high-NeuN subpopulation (‘Neurons’) in ΔNICmice (n=3) and WT-EGFP controls (n=3), detected using EGFP fluorescence.Graph shows mean±S.E.M and genotypes were compared by t-test: All***p=0.0002 and Neurons ***p=0.0001.

C) Quantitative PCR analysis of mRNA levels prepared from whole brain ofΔNIC mice (n=3) and wild-type littermates (n=3). Mecp2 transcript levelswere normalised to Cyclophilin A. Graph shows mean±S.E.M (relative towild-type) and genotypes were compared by t-test: **p=0.005.

D) Phenotypic scoring of ΔNIC mice (n=10) compared to their wild-typelittermates (n=10) over one year. Graph shows mean scores±S.E.M.Mecp2-null data (n=12)¹⁶ is used as a comparator.

E) Kaplan-Meier plot showing survival of the same cohort in part D. OneΔNIC animal died at 43 weeks without exceeding a severity score of 2.5.Mecp2-null data (n=24)¹⁶ is used as a comparator.

F), G), H) Behavioural analysis of separate cohorts performed at 20weeks of age: ΔNIC (n=10) compared to their wildtype littermates (n=10).All graphs show individual values and medians, and the results ofstatistical analysis comparing genotypes (see below): not significant(‘n.s.’) p>0.05, *p<0.05, **p<0.01.

F) Time spent in the closed and open arms and central region of theElevated Plus Maze was measured during a 15 minute trial, and genotypeswere compared using KS tests: closed arms **p=0.003, open arms p=0.055and centre *p=0.015.

G) Time spent in the centre region of the Open Field test was measuredduring a 20 minute trial, and genotypes were compared using a t-test:p=0.402.

H) Average latency to fall from the Accelerating Rotarod in four trialswas calculated for each of the three days of the experiment, andgenotypes were compared using KS tests: day 1 p=0.164, day 2 p=0.055 andday 3 **p=0.003. Changed performance (learning/worsening) over the threeday period was determined using Friedman tests: wild-type animalsp=0.601, ΔNIC animals **p=0.003.

FIG. 10 shows that outbred ΔNIC mice had 100% survival over one year.

Kaplan-Meier plot showing survival of an outbred (75% C57BL/6J) cohortof ΔNIC mice (n=10) and their wild-type littermate (n=1). Mecp2-nulldata (n=24)¹⁶ is used as a comparator.

FIG. 11 shows that ΔNIC mice have decreased body weight.

Growth curve of the backcrossed scoring cohort (see FIG. 9D-E). Graphshows mean±S.E.M. Genotypes were compared using repeated measures ΔNOVA:****p<0.0001. Mecp2-null data (n=20)¹⁶ is used as a comparator.

FIG. 12 shows that no activity phenotype was detected for ΔNIC mice.

Behavioural analysis of ΔNIC (n=10) compared to their wildtypelittermates (n=10) at 20 weeks of age (see FIG. 9F-H). Total distancetravelled the Open Field test was measured during a 20 minute trial.Graphs show individual values and medians, and genotypes were comparedusing a t-test p=0.333.

FIG. 13 shows that ΔNIC mice have a less severe phenotype than themildest mouse model of RTT, R133C.

A), B), C) Copy of phenotypic analysis of ΔNIC mice and wild-typelittermates presented in FIG. 9D-E and FIG. S11 using EGFP-tagged R133Cmice (n=10)¹⁶ as a comparator.

FIG. 14 shows that ‘STOP’ mice with transcriptionally silenced ΔNICresemble Mecp2 nulls.

A) Western blot analysis of crude whole brain extract showing proteinsizes and levels in STOP mice (n=3) compared to WT-EGFP (n=3) and ΔNICcontrols (n=3), detected using a GFP antibody. Histone H3 was used as aloading control. *denotes a non-specific band detected by the GFPantibody.

B) Flow cytometry analysis of protein levels in nuclei prepared fromwhole brain (‘All’) and the high-NeuN subpopulation (‘Neurons’) inWT-EGFP (n=3), ΔNIC (n=3) and STOP (n=3) mice, detected using EGFPfluorescence. Graph shows mean±S.E.M and genotypes were compared usingt-tests: **** denotes a p value<0.0001.

C) Phenotypic scoring of STOP mice (n=22) compared to Mecp2-null data(n=12)¹⁶. Graph shows mean scores±S.E.M.

D) Kaplan-Meier plot showing survival of STOP mice (n=14) compared toMecp2-null data (n=24)¹⁶.

FIG. 15 shows that reactivation of ΔNIC successfully reversesneurological symptoms in MeCP2-deficient mice.

A) Timeline of the reversal experiment (results shown in B-C and FIG.16).

B) Phenotypic scoring of Tamoxifen-injected mice from 4-28 weeks: WT^(T)(n=4), WT CreER^(T) (n=4), STOP^(T) (n=9) and STOP CreER^(T) (n=9).Graph shows mean scores±S.E.M.

C) Kaplan-Meier plot showing survival of the same cohort. Arrowsindicate the timing of Tamoxifen injections. ^(‘T’) denotesTamoxifen-injected animals.

D) Timeline of the AAV-mediated rescue experiment (results shown in E-Fand FIG. 17).

E) Phenotypic scoring of AAV9-injected mice from 5-20 weeks: WT+vehicle(n=19), Null+vehicle (n=21) and Null+ΔNIC (n=11). Graph shows meanscores±S.E.M.

F) Kaplan-Meier plot showing survival of the same animals. An arrowindicates the timing of the viral injection.

FIG. 16 shows successful reactivation of ΔNIC in Tamoxifen-injected STOPCreER mice.

A) Southern blot analysis of genomic DNA to determine the level ofrecombination by CreER in Tamoxifen (‘+Tmx’)-injected STOP CreER animals(n=8). One Tamoxifen-injected STOP animal was included as a negativecontrol showing recombination was dependant on CreER. Other samples wereincluded for reference (see restriction map in FIG. 4).

B) Protein levels in Tamoxifen-injected STOP CreER animals wasdetermined using western blotting (upper, n=5) and flow cytometry(lower, n=3). Constitutively expressing ΔNIC mice (n=3) were used as acomparator. Graphs show mean values±S.E.M (quantification by westernblotting is shown normalised to ΔNIC). Genotypes were compared usingt-tests: western blotting p=0.434; flow cytometry All nuclei p=0.128 andNeuronal nuclei *p=0.016.

FIG. 17 shows that introduction of ΔNIC into wild-type mice does nothave adverse consequences.

A) Phenotypic scoring of AAV9-injected mice from 5-20 weeks: WT+vehicle(n=19) Null+vehicle (n=21) and WT+ΔNIC (n=9). Graph shows meanscores±S.E.M. An arrow indicates the timing of the viral injection.

B) Design of construct used in the vector delivery of ΔNIC. Putativeregulatory elements (RE) in the extended mMeP426 promoter and endogenousdistal 3′-UTR are indicated. The extent of the short 229 bp region ofthe murine Mecp2 endogenous core promoter that is disclosed in theart^(29,38) (mMeP229) is shown relative to the mMeP426 promoter used inthis construct. The RDH1pA 3′-UTR consists of several exogenous microRNA(miR) binding sites incorporated as a ‘binding panel’ adjacent to aportion of the distal endogenous MECP2 polyadenylation signal and itsaccompanying regulatory elements. References with an asterisk indicatehuman in vitro studies, not rodent.

C) Full, annotated, sequence of the expression cassette illustrated inFIG. 17B, with flanking AAV2 ITRs. This sequence is also provided as SEQID NO: 65.

FIG. 18 shows an alignment of the cDNA sequence of wild type human MECP2e1 isoform with cDNA sequences encoding polypeptide sequences inaccordance with the invention and the experimental results providedherein. “Human WT” (SEQ ID NO: 18) is a cDNA sequence for the wild typeMeCP2 isoform 1. “dNIC-Myc” (SEQ ID NO: 62) is a cDNA sequence for asynthetic polypeptide in accordance with the invention having deletionsin the N and C-terminal sequences of MeCP2 and in the sequence linkingthe MBD and NID, and having a Myc tag at the C-terminus. “dNC-Myc” (SEQID NO: 63) is a cDNA sequence for a synthetic polypeptide in accordancewith the invention having deletions in the N and C-terminal sequences ofMeCP2 and having a Myc tag at the C-terminus. The sections of the cDNAsequences corresponding to the extreme N terminus of the polypeptide,providing the e1-specific sequences, the MBD, the NID, the Myc tag, aSV40 NLS, and linkers for attaching the tag and NLS, are all indicated.

FIG. 19 shows an alignment of the amino acid sequence of wild type humanMeCP2 e1 isoform with polypeptide sequences in accordance with theinvention and the experimental results provided herein. “Human WT” (SEQID NO: 3) is the amino acid sequence for the wild type MeCP2 isoform 1.“dNIC-Myc” (SEQ ID NO: 61) is the amino acid sequence for a syntheticpolypeptide in accordance with the invention having deletions in the Nand C-terminal sequences of MeCP2 and in the sequence linking the MBDand NID, and having a Myc tag at the C-terminus. “dNC-Myc” (SEQ ID NO:60) is the amino acid sequence for a synthetic polypeptide in accordancewith the invention having deletions in the N and C-terminal sequences ofMeCP2 and having a Myc tag at the C-terminus. The sections of the aminoacid sequences corresponding to the extreme N terminus of thepolypeptide, having the e1-specific sequences, the MBD, the NID, the Myctag, a SV40 NLS, and linkers for attaching the tag and NLS, are allindicated.

FIG. 20 shows an alignment of the cDNA sequence of the wild type mouseMECP2 e1 isoform, with an EGFP tag, with cDNA sequences encodingpolypeptide sequences in accordance with the invention and theexperimental results provided herein. “dNIC-EGFP” (SEQ ID NO: 51) is acDNA sequence for a synthetic polypeptide in accordance with theinvention having deletions in the N and C-terminal sequences of MeCP2and in the sequence linking the MBD and NID, and having an EGFP tag atthe C-terminus. “dNC-EGFP” (SEQ ID NO: 50) is a cDNA sequence for asynthetic polypeptide in accordance with the invention having deletionsin the N and C-terminal sequences of MeCP2 and having an EGFP tag at theC-terminus. “WT-EGFP” (SEQ ID NO: 48) is a cDNA sequence for the wildtype MeCP2 isoform 1 with an EGFP tag. “dN-EGFP” (SEQ ID NO: 49) is acDNA sequence for a synthetic polypeptide in accordance with theinvention having deletions in the N terminal sequences of MeCP2 andhaving an EGFP tag at the C-terminus. The sections of the cDNA sequencescorresponding to the extreme N terminus of the polypeptide, providingthe e1-specific sequences, the MBD, the NID, the EGFP tag, a SV40 NLS,and linkers for attaching the tag and NLS, are all indicated.

FIG. 21 shows an alignment of the amino acid sequence of wild type humanMECP2 e1 isoform, with an EGFP tag, with polypeptide sequences inaccordance with the invention and the experimental results providedherein. “WT-EGFP/1-748” (SEQ ID NO: 40) is an amino acid sequence forthe wild type MeCP2 isoform 1 with an EGFP tag at the C-terminus.“ΔN-EGFP/1-689” (SEQ ID NO: 41) is the amino acid sequence for asynthetic polypeptide in accordance with the invention having deletionsin the N terminal sequences of MeCP2 and having an EGFP tag at theC-terminus. “ΔNIC-EGFP/1-432” (SEQ ID NO: 43) is the amino acid sequencefor a synthetic polypeptide in accordance with the invention havingdeletions in the N and C-terminal sequences of MeCP2 and in the sequencelinking the MBD and NID, and having an EGFP tag at the C-terminus.“ΔNC-EGFP/1-516” (SEQ ID NO: 42) is the amino acid sequence for asynthetic polypeptide in accordance with the invention having deletionsin the N and C-terminal sequences of MeCP2 and having an EGFP tag at theC-terminus. The sections of the amino acid sequences corresponding tothe extreme N terminus of the polypeptide, having the e1-specificsequences, the MBD, the NID, the EGFP tag, a SV40 NLS, and linkers forattaching the tag and NLS, are all indicated.

SEQUENCESBelow are polynucleotide and amino acid sequences used in accordance with the invention.[SEQ ID NO: 1] MeCP2 Methyl-CpG Binding Domain (MBD) polypeptide sequencePAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPP[SEQ ID NO: 2] MeCP2 NCoR/SMRT Interaction Domain (NID) polypeptide sequencePGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV[SEQ ID NO: 3] Full length human wild type MeCP2 polypeptide sequence (e1 isoform)MAAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEAGKAETSEGSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSVCKEEKMPRGGSLESDGCPKEPAKTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS[SEQ ID NO: 4] Full length human wild type MeCP2 polypeptide sequence (e2 isoform)MVAGMLGLREEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEAGKAETSEGSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSVCKEEKMPRGGSLESDGCPKEPAKTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVT ERVS[SEQ ID NO: 5] MeCP2 polypeptide sequence (e1 isoform) N-terminal to the MBDMAAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEAGKAETSEGSGSA[SEQ ID NO: 6] MeCP2 polypeptide sequence (e2 isoform) N-terminal to the MBDMVAGMLGLREEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEA GKAETSEGSGSA[SEQ ID NO: 7] MeCP2 polypeptide sequence intervening between the MBD and NIDKKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRK[SEQ ID NO: 8] MeCP2 polypeptide sequence C-terminal to the NIDSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSVCKEEKMPRGGSLESDGCPKEPAKTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS[SEQ ID NO: 9] Mouse e1 specific extreme N-terminus polypeptide sequenceMAAAAATAAAAAAPSGGGGGGEEERLEEK[SEQ ID NO: 10] Mouse e2 specific extreme N-terminus polypeptide sequenceMVAGMLGLREEK[SEQ ID NO: 11] Human e1 specific extreme N-terminus polypeptide sequenceMAAAAAAAPSGGGGGGEEERLEEK[SEQ ID NO: 12] Human e2 specific extreme N-terminus polypeptide sequenceMVAGMLGLREEK[SEQ ID NO: 13] ΔNC: A truncated synthetic polypeptide sequence (from humanMeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV[SEQ ID NO: 14] ΔNIC: A truncated synthetic polypeptide sequence (from humanMeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPGSSGSSGPKKKRKVPGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV[SEQ ID NO: 15] ΔN mouse: A truncated synthetic polypeptide sequence (from mouseMeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESTKAPMPLLPSPPPPEPESSEDPISPPEPQDLSSSICKEEKMPRGGSLESDGCPKEPAKTQPMVATTTTVAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS[SEQ ID NO: 16] ΔNC mouse: A truncated synthetic polypeptide sequence (from mouseMeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETV[SEQ ID NO: 17] ΔNIC mouse: A truncated synthetic polypeptide sequence (from mouseMeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPGSSGSSGPKKKRKVPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETV[SEQ ID NO: 18] Full length human wild type MeCP2 cDNA sequence (e1 isoform)ATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAAGAAAAGTCAGAAGACCAGGACCTCCAGGGCCTCAAGGACAAACCCCTCAAGTTTAAAAAGGTGAAGAAAGATAAGAAAGAAGAGAAAGAGGGCAAGCATGAGCCCGTGCAGCCATCAGCCCACCACTCTGCTGAGCCCGCAGAGGCAGGCAAAGCAGAGACATCAGAAGGGTCAGGCTCCGCCCCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAGCGGCGCTCCATCATCCGTGACCGGGGACCCATGTATGATGACCCCACCCTGCCTGAAGGCTGGACACGGAAGCTTAAGCAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGTGTATTTGATCAATCCCCAGGGAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACTTCGAAAAGGTAGGCGACACATCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCCGGCGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGAGGCCGGGGACGCCCCAAAGGGAGCGGCACCACGAGACCCAAGGCGGCCACGTCAGAGGGTGTGCAGGTGAAAAGGGTCCTGGAGAAAAGTCCTGGGAAGCTCCTTGTCAAGATGCCTTTTCAAACTTCGCCAGGGGGCAAGGCTGAGGGGGGTGGGGCCACCACATCCACCCAGGTCATGGTGATCAAACGCCCCGGCAGGAAGCGAAAAGCTGAGGCCGACCCTCAGGCCATTCCCAAGAAACGGGGCCGAAAGCCGGGGAGTGTGGTGGCAGCCGCTGCCGCCGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCTATCCGATCTGTGCAGGAGACCGTACTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTCGGTGAGAAGAGCGGGAAAGGACTGAAGACCTGTAAGAGCCCTGGGCGGAAAAGCAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGCGCCTCCTCACCCCCCAAGAAGGAGCACCACCACCATCACCACCACTCAGAGTCCCCAAAGGCCCCCGTGCCACTGCTCCCACCCCTGCCCCCACCTCCACCTGAGCCCGAGAGCTCCGAGGACCCCACCAGCCCCCCTGAGCCCCAGGACTTGAGCAGCAGCGTCTGCAAAGAGGAGAAGATGCCCAGAGGAGGCTCACTGGAGAGCGACGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCCGCGGTTGCCACCGCCGCCACGGCCGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCCTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGACCGAGAGAGT TAGC[SEQ ID NO: 19] ΔNC: cDNA sequence of a truncated synthetic polypeptide sequence(from human MeCP2)CCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAGCGGCGCTCCATCATCCGTGACCGGGGACCCATGTATGATGACCCCACCCTGCCTGAAGGCTGGACACGGAAGCTTAAGCAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGTGTATTTGATCAATCCCCAGGGAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACTTCGAAAAGGTAGGCGACACATCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCCGGCGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGAGGCCGGGGACGCCCCAAAGGGAGCGGCACCACGAGACCCAAGGCGGCCACGTCAGAGGGTGTGCAGGTGAAAAGGGTCCTGGAGAAAAGTCCTGGGAAGCTCCTTGTCAAGATGCCTTTTCAAACTTCGCCAGGGGGCAAGGCTGAGGGGGGTGGGGCCACCACATCCACCCAGGTCATGGTGATCAAACGCCCCGGCAGGAAGCGAAAAGCTGAGGCCGACCCTCAGGCCATTCCCAAGAAACGGGGCCGAAAGCCGGGGAGTGTGGTGGCAGCCGCTGCCGCCGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCTATCCGATCTGTGCAGGAGACCGTACTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTC[SEQ ID NO: 20] ΔNIC: cDNA sequence of a truncated synthetic polypeptide sequence(from human MeCP2)CCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAGCGGCGCTCCATCATCCGTGACCGGGGACCCATGTATGATGACCCCACCCTGCCTGAAGGCTGGACACGGAAGCTTAAGCAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGTGTATTTGATCAATCCCCAGGGAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACTTCGAAAAGGTAGGCGACACATCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCCGGCGAGAGCAGAAACCACCTGGATCCAGTGGCAGCTCTGGGCCCAAGAAAAAGCGGAAGGTGCCGGGGAGTGTGGTGGCAGCCGCTGCCGCCGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCTATCCGATCTGTGCAGGAGACCGTACTCCCCATCAAGAAGCGCAAGACCC GGGAGACGGTC[SEQ ID NO: 21] Full length mouse wild type MeCP2 cDNA sequence (el isoform)ATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAGGAAAAGTCAGAAGACCAGGATCTCCAGGGCCTCAGAGACAAGCCACTGAAGTTTAAGAAGGCGAAGAAAGACAAGAAGGAGGACAAAGAAGGCAAGCATGAGCCACTACAACCTTCAGCCCACCATTCTGCAGAGCCAGCAGAGGCAGGCAAAGCAGAAACATCAGAAAGCTCAGGCTCTGCCCCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTTGGTGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAGCCCTGGGCGTAAAAGCAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTCCCCACCTAAGAAGGAGCACCATCATCACCACCATCACTCAGAGTCCACAAAGGCCCCCATGCCACTGCTCCCATCCCCACCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCAGCCCCCCTGAGCCTCAGGACTTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCCGAGGAGGCTCACTGGAAAGCGATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTATGGTCGCCACCACTACCACAGTTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCTTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGACCGAGAG AGTTAGCTCT[SEQ ID NO: 22] ΔN mouse: cDNA for a truncated synthetic polypeptide sequence (frommouse MeCP2) CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTTGGTGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAGCCCTGGGCGTAAAAGCAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTCCCCACCTAAGAAGGAGCACCATCATCACCACCATCACTCAGAGTCCACAAAGGCCCCCATGCCACTGCTCCCATCCCCACCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCAGCCCCCCTGAGCCTCAGGACTTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCCGAGGAGGCTCACTGGAAAGCGATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTATGGTCGCCACCACTACCACAGTTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCTTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGACCGAGAGAGTTAGCTGT[SEQ ID NO: 23] ΔNC mouse: cDNA for a truncated synthetic polypeptide sequence(from mouse MeCP2)CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTC[SEQ ID NO: 24] ΔNIC mouse: cDNA for a truncated synthetic polypeptide sequence(from mouse MeCP2)CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTGGATCCAGTGGCAGCTCTGGGCCCAAGAAAAAGCGGAAGGTGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGA GACGGTC[SEQ ID NO: 25] Mouse e1 specific extreme N-terminus cDNA sequenceATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAGGAAAAG[SEQ ID NO: 26] Mouse e2 specific extreme N-terminus cDNA sequenceATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGGGAGGAAAAG[SEQ ID NO: 27] Human e1 specific extreme N-terminus cDNA sequenceATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGGAGAGACTGGAAGAAAAG[SEQ ID NO: 28] Human e2 specific extreme N-terminus cDNA sequenceATGGTAGCTGGGATGTTAGGGCTCAGGGAAGAAAAG[SEQ ID NO: 29] Full length mouse wild type MeCP2 polypeptide sequence (e1 isoform)MAAAAATAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLRDKPLKFKKAKKDKKEDKEGKHEPLQPSAHHSAEPAEAGKAETSESSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESTKAPMPLLPSPPPPEPESSEDPISPPEPQDLSSSICKEEKMPRGGSLESDGCPKEPAKTQPMVATTTTVAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS[SEQ ID NO: 30] Full length mouse wild type MeCP2 polypeptide sequence (e2 isoform)MVAGMLGLREEKSEDQDLQGLRDKPLKFKKAKKDKKEDKEGKHEPLQPSAHHSAEPAEAGKAETSESSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSESTKAPMPLLPSPPPPEPESSEDPISPPEPQDLSSSICKEEKMPRGGSLESDGCPKEPAKTQPMVATTTTVAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTER VS[SEQ ID NO: 31] Full length mouse wild type MeCP2 cDNA sequence (e2 isoform)ATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGTCAGAAGACCAGGATCTCCAGGGCCTCAGAGACAAGCCACTGAAGTTTAAGAAGGCGAAGAAAGACAAGAAGGAGGACAAAGAAGGCAAGCATGAGCCACTACAACCTTCAGCCCACCATTCTGCAGAGCCAGCAGAGGCAGGCAAAGCAGAAACATCAGAAAGCTCAGGCTCTGCCCCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTTGGTGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAGCCCTGGGCGTAAAAGCAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTCCCCACCTAAGAAGGAGCACCATCATCACCACCATCACTCAGAGTCCACAAAGGCCCCCATGCCACTGCTCCCATCCCCACCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCAGCCCCCCTGAGCCTCAGGACTTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCCGAGGAGGCTCACTGGAAAGCGATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTATGGTCGCCACCACTACCACAGTTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCTTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGAC CGAGAGAGTTAGC

EXPERIMENTAL RESULTS

1. Materials and Methods

Nomenclature

According to convention, all amino acid numbers given in the followingrefer to the e2 isoform. Numbers refer to homologous amino acids inhuman (NCBI accession P51608) and mouse (NCBI accession Q9Z2D6) untilresidue 385 where there is a two amino acid insertion in the humanprotein.

Mutation Analysis

Mutational data was collected as described previously⁴: causativeRTT-causing missense mutations were extracted from the RettBASEdataset¹³; and polymorphisms identified in healthy hemizygous males wereextracted from the Exome Aggregation Consortium (ExAC) database¹⁴.

Design of Shortened MeCP2 Proteins

The MBD and NID were defined as residues 72-173 and 272-312,respectively. All three constructs retain the extreme N-terminalsequences encoded by exons 1 and 2—present in isoforms e1 and e2respectively. They also include the first three amino acids of exons 3(EEK) to preserve the splice acceptor site. The intervening region (I)was replaced in ΔNIC by the NLS of SV40 preceded by a flexible linker.The sequence of the NLS is PKKKRKV (SEQ ID NO: 32) (DNA sequence:CCCAAGAAAAAGCGGAAGGTG (SEQ ID NO: 33)) and of the linker is GSSGSSG (SEQID NO: 34) (DNA sequence: GGATCCAGTGGCAGCTCTGGG (SEQ ID NO: 35)). Allthree proteins were C-terminally tagged with EGFP connected by a linker.To be consistent with a previous study tagging full-length MeCP2¹⁶, thelinker sequence CKDPPVAT (SEQ ID NO: 36) (DNA sequence:TGTAAGGATCCACCGGTCGCCACC (SEQ ID NO: 37)) was used to connect theC-terminus of ΔN to EGFP. To connect the NID to the EGFP tag in ΔNC andΔNIC, the flexible GSSGSSG (SEQ ID NO: 38) linker was used instead (DNAsequence: GGGAGCTCCGGCAGTTCTGGA (SEQ ID NO: 39)). The amino acidsequences of the e1 and e2 isoforms for WT-EGFP, ΔN-EGFP, ΔNC-EGFP andΔNIC-EGFP polypeptides are provided herein as SEQ ID NOs 40-47,respectively. The cDNA sequences of the e1 and e2 isoforms for theWT-EGFP, ΔN-EGFP, ΔNC-EGFP and ΔNIC-EGFP polypeptides are providedherein as SEQ ID NOs 48-55, respectively.

For expression in cultured cells, cDNA sequences encoding e2 isoforms ofthe MeCP2 deletion series were synthesised (GeneArt, Thermo FisherScientific) and cloned into the pEGFPN1 vector (Clontech) using XhoI andNotI restriction sites (NEB). Point mutations (R111G and R3060) wereinserted into the WT-EGFP plasmid using the QuikChange II XLSite-Directed Mutagenesis Kit (Agilent Technologies). Primer sequencesfor R111G: Forward TGGACACGAAAGCTTAAACAAGGGAAGTCTGGCC (SEQ ID NO: 56)and Reverse GGCCAGACTTCCCTTGTTTAAGCTTTCGTGTCCA (SEQ ID NO: 57); andR3060: Forward CTCCCGGGTCTTGCACTTCTTGATGGGGA (SEQ ID NO: 58) and ReverseTCCCCATCAAGAAGTGCAAGACCCGGGAG (SEQ ID NO: 59). For ES cell targeting,genomic sequences encoding exons 3 and 4 of the EGFP-tagged shortenedproteins were synthesised (GeneArt, Thermo Fisher Scientific) and clonedinto a previously used²⁴ targeting vector using MfeI restriction sites(NEB). This vector contains a Neomycin resistance gene followed by atranscriptional ‘STOP’ cassette flanked by LoxP sites (‘floxed’) inintron 2.

For viral delivery of shortened MeCP2 proteins, Myc epitope taggedproteins were prepared. The amino acids sequences of human ΔNC-Myc andΔNIC-Myc polypeptides are provided herein as SEQ ID NOs 60-61,respectively. The cDNA sequences of human ΔNC-Myc and ΔNIC-Mycpolypeptides are provided herein as SEQ ID NOs 62-63, respectively.

Cell Culture

HeLa and NIH-3T3 cells were grown in DMEM (Gibco) supplemented with 10%foetal bovine serum (FBS; Gibco) and 1% Penicillin-Streptomycin (Gibco).ES cells were grown in Glasgow MEM (Gibco) supplemented with foetalbovine serum (FBS; Gibco-batch tested), 1% Non-essential amino acids(Gibco), 1% Sodium Pyruvate (Gibco), 0.1% β-mercaptoethanol (Gibco) andLIF (ESGRO).

Immunoprecipitation

HeLa cells were transfected with pEGFPN1-MeCP2 plasmids using JetPEI(PolyPlus Transfection) and harvested after 24-48 hours. Nuclearextracts were prepared using Benzonase (Sigma E1014-25KU) and 150 mMNaCl, and MeCP2-EGFP complexes were captured using GFP-Trap_A beads(Chromotek) as described previously⁴. Proteins were analysed by westernblotting using antibodies to GFP (NEB #2956), NCoR (Bethyl A301-146A),HDAC3 (Sigma 3E11) and TBL1XR1 (Bethyl A300-408A), all at a dilution of1:1000; followed by LI-COR secondary antibodies: IRDye® 800CW Donkeyanti-Mouse (926-32212) and IRDye® 800CW Donkey anti-Rabbit (926-32213)or IRDye® 680LT Donkey anti-Rabbit (926-68023) at a dilution of1:10,000.

Recruitment Assay

NIH-3T3 cells were seeded on coverslips in 6 well plates (25,000 cellsper well) and transfected with 2 pg plasmid DNA (pEGFPN1-MeCP2 andpmCherry-TBL1X) using JetPEI (PolyPlus Transfection). After 48 hours,cells were fixed with 4% (w/v) paraformaldehyde, stained with DAPI(Sigma) and then mounted using ProLong Diamond (Life Technologies).Fixed cells were photographed using confocal microscopy (Leica SP5).

Generation of Knock-In Mice

Targeting vectors were introduced into 129/Ola E14 TG2a ES cells byelectroporation, and G418-resistant clones with correct targeting at theMecp2 locus were identified by PCR and Southern blot screening.CRISPR-Cas9 technology was used to increase the targeting efficiency ofΔN and ΔNIC lines: the guide RNA sequence (GGTTGTGACCCGCCATGGAT) (SEQ IDNO: 64) was cloned into pX330-U6-Chimeric_BB-CBh-hSpCas9 (a gift fromFeng Zhang; Addgene plasmid #42230²⁵), which was introduced into the EScells with the targeting vectors. This introduced a double-strand cut inintron 2 of the wild-type gene (at the site of the NeoSTOP cassette inthe targeting vector). Mice were generated from ES cells as previouslydescribed²⁶.The ‘floxed’ NeoSTOP cassette was removed in vivo bycrossing chimaeras with homozygous females from the transgenic CMV-Credeleter strain (JAX Stock #006054) on a C57BLJ6J background. The CMV-Cretransgene was subsequently bred out. All mice used in this study werebred and maintained at the University of Edinburgh animal facilitiesunder standard conditions and procedures were carried out by stafflicensed by the UK Home Office and according with the Animal andScientific Procedures Act 1986.

Biochemical Characterisation of Knock-In Mice

For biochemical analysis, brains were harvested by snap-freezing inliquid nitrogen at 6-13 weeks of age, unless otherwise stated. Brains ofhemizygous male mice were used for all analysis, unless otherwisestated. For Southern blot analysis, half brains were homogenised in 50mM Tris Cl pH7.5, 100 mM NaCl, 5 mM EDTA and treated with 0.4 mg/mlProteinase K in 1% SDS at 55° C. overnight. Samples were treated with0.1 mg/ml RNAseA for 1-2 hours at 37° C., before phenol:chloroformextraction of genomic DNA. Genomic DNA was purified from ES cells usingPuregene Core Kit A (Qiagen) according to manufacturer's instructionsfor cultured cells. Genomic DNA was digested with restriction enzymes(NEB), separated by agarose gel electrophoresis and transferred ontoZetaProbe membranes (BioRad). DNA probes homologous to either exon 4 orthe end of the 3′ homology arm were radioactively labelled with[α32]dCTP (Perkin Elmer) using the Prime-a-Gene Labeling System(Promega). Blots were probed overnight, washed, and exposed inPhosphorimager cassettes before scanning on a Typhoon FLA 7000. Bandswere quantified using ImageQuant software.

Protein levels in whole brain crude extracts were quantified usingwestern blotting. Extracts were prepared as described previously¹⁶, andblots were probed with antibodies to GFP (NEB #2956) or MeCP2 (SigmaM6818), both at a dilution of 1:1,000, followed by LI-COR secondaryantibodies (listed above). Histone H3 (Abcam ab1791) was used as aloading control (dilution 1:10,000). Levels were quantified using ImageStudio Lite Ver 4.0 software and compared using t-tests. WT-EGFP mice¹⁶were used as controls.

For flow cytometry analysis, fresh brains were harvested from 12week-old animals and Dounce-homogenised in 5 ml homogenisation buffer(320 mM sucrose, 5 mM CaCl2, 3 mM Mg(Ac)2, 10 mM Tris HCl pH.7.8, 0.1 mMEDTA, 0.1% NP40, 0.1 mM PMSF, 14.3 mM β-mercaptoethanol, proteaseinhibitors (Roche)), and 5 ml of 50% OptiPrep gradient centrifugationmedium (50% Optiprep (Sigma D1556-250ML), 5 mM CaCl2, 3 mM Mg(Ac)2, 10mM Tris HCl pH7.8, 0.1M PMSF, 14.3 mM β-mercaptoethanol) was added. Thiswas layered on top of 10 ml of 29% OptiPrep solution (v/v in H2O) inUltra clear Beckman Coulter centrifuge tubes, and samples werecentrifuged at 7,500 rpm for 30 mins, 4° C. Pelleted nuclei wereresuspended in Resuspension buffer (20% glycerol in DPBS with proteaseinhibitors (Roche)). For flow cytometry analysis, nuclei were pelletedat 600×g (5 mins, 4° C.), washed in 1 ml PBTB (5% (w/v) BSA, 0.1% TritonX-100 in DPBS with protease inhibitors (Roche)), and then resuspended in250 pl PBTB. To stain for NeuN, 10 μl of NeuN-A60 antibody (MilliporeMAB377) was conjugated to Alexa Fluor 647 (APEX Antibody Labelling Kit,Invitrogen A10475), added at a dilution of 1:125 and incubated underrotation for 45 mins at 4° C. Flow cytometry (BD LSRFortessa SORP) wasused to obtain the mean EGFP expression for the total nuclei (n=50,000per sample) and the high NeuN (neuronal) subpopulation (n>8,000 persample), and genotypes were compared using t-tests. WT-EGFP mice¹⁶ wereused as controls.

To determine mRNA levels, RNA was purified and reverse transcribed fromhalf brains; and Mecp2 and Cyclophilin A transcripts were analysed byqPCR as previously described¹⁶. mRNA levels in ΔNIC mice were comparedto wild-type littermates using a t-test.

Phenotypic Characterisation of Knock-In Mice

Consistent with a previous study¹⁶, mice were backcrossed fourgenerations to reach ˜94% C57BL/6J before undergoing phenotypiccharacterisation. Two separate cohorts, each consisting of 10 mutantanimals and 10 wild-type littermates, were produced for each novelknock-in line. One cohort was scored and weighed regularly from 4-52weeks of age as previously described^(24,27). Survival was graphed usingKaplan Meier plots. (A preliminary outbred [75% C57BL/6J] cohort of 7ΔNC mice and 9 wild-type littermates was also scored.) The secondbackcrossed cohort underwent behavioural analysis at 20-21 weeks of age(see ²⁷ and ¹⁶ for detailed protocols). Tests were performed over atwo-week period: Elevated Plus Maze on day 1, Open Field test on day 2,and Accelerating Rotarod test on days 6-9 (one day of training followedby three days of trials). All analysis was performed blind to genotype.

Statistical Analysis

Growth curves were compared using repeated measures ΔNOVA. Forbehavioural analysis, when all data fitted a normal distribution (OpenField centre time and distance travelled), genotypes were compared usingt-tests. If not (Elevated Plus Maze time in arms and AcceleratingRotarod latency to fall), genotypes were compared usingKolmogorov-Smirnov tests. Change in performance over time in theAccelerating Rotarod test was determined using Friedman tests.

Genetic Reactivation of Minimal MeCP2 (ΔNIC)

Transcriptionally silent minimal MeCP2 (ΔNIC) was reactivated insymptomatic null-like ‘STOP’ mice following the procedure used in ²⁷. Inshort, the ΔNIC Mecp2 allele was inactivated by the retention of theNeoSTOP cassette in intron 2 by mating chimaeras with wild-type femalesinstead of deleter mice. Resulting STOP/+ females were crossed withheterozygous Cre-ER transgenic males (JAX Stock #004682) to producemales of four genotypes (87.5% C57BLJ6J). A cohort consisting of allfour genotypes WT (n=4), WT CreER (n=4), STOP (n=9) and STOP CreER(n=9), was scored and weighed weekly from 4 weeks of age. From 6 weeks(when STOP and STOP CreER mice displayed RTT-like symptoms), allindividuals were given a series of Tamoxifen injections: two weeklyfollowed by five daily, each at a dose of 100 pg/g body weight. Braintissue from Tamoxifen-treated STOP CreER (n=8), WT (n=1) and WT CreER(n=1) animals was harvested at 28 weeks of age (after successful symptomreversal in STOP CreER mice) for biochemical analysis. Brain tissue fromone Tamoxifen-treated STOP mouse was also included in the biochemicalanalysis (methods described above).

Vector Delivery of Minimal MeCP2 (ΔNIC)

Minimal MeCP2 (ΔNIC) AAV vector was tested in Mecp2-null and WT micemaintained on a C57BL/6 background. Recombinant AAV vector particleswere generated at the UNC Gene Therapy Center Vector Core facility.Self-complementary AAV (scAAV) particles (AAV2 ITR-flanked genomespackaged into AAV9 capsids) were produced from suspension HEK293 cellstransfected using polyethyleneimine (Polysciences, Warrington, Pa.) withhelper plasmids (pXX6-80, pGSK2/9) and a plasmid containing theITR-flanked ΔNIC transgene construct. The construct used is illustratedin FIG. 17B, and the annotated sequence (SEQ ID NO: 65) of theITR-flanked ΔNIC transgene construct is shown in FIG. 17C. Fortranslational relevance, the ΔNIC-expressing construct utilized theequivalent human MECP2 e1 coding sequence and with a small C-terminalMyc epitope tag replacing the EGFP tag used in other experiments. Thetransgene was under the control of an extended endogenous Mecp2 promoterfragment (MeP426) incorporating additional promoter regulatory elementsand a putative silencer element (FIGS. 17B,C). The construct alsoincorporated a novel 3′-UTR consisting of a fragment of the endogenousMECP2 3′UTR together with a selected panel of binding sites for miRNAsknown to be involved in regulation of Mecp2³⁹⁻⁴¹ (FIGS. 17B,C). Virusproduction was performed as previously described²⁸, and vector preparedin a final formulation of high-salt PBS (containing 350 mM total NaCl)supplemented with 5% sorbitol. For brain injection into mice, directbilateral injections of virus (3 μl per site; dose=1×10¹¹ viral genomeper mouse) were delivered into the neuropil of unanaesthetised P1/2males, as described previously²⁹. Control injections were made using thesame diluent lacking vector (‘vehicle control’). The injected pups werereturned to the home cage and assessed weekly as described above.

2. Results

The amino acid sequence of MeCP2 is highly conserved throughoutvertebrate species (FIG. 1A), suggesting that most of the protein issubject to purifying selection. This supports the widely-held view thatits interactions with multiple binding partners are of functionalimportance: with which MeCP2 has been implicated in several cellularpathways required for proper neuronal function^(11,3). An alternativepicture emerges when analysing the distribution of RTT-causing missensemutations, highlighting only the MBD and NID—a small minority of theprotein—as critical (FIG. 1A). Furthermore, exome sequencing datacollected from healthy individuals shows a large number of polymorphismsin the other regions of the protein (FIG. 1A), suggesting thesesequences are dispensable. To test whether the MBD and NID might besufficient for MeCP2 function, we designed a stepwise series ofdeletions of the endogenous gene to remove regions N-terminal to the MBD(ΔN), C-terminal to the NID (ΔC) and the intervening amino acids betweenthese domains (ΔI) (FIG. 1B). The intervening region was replaced by anuclear localisation signal (NLS) sequence derived from SV40 virus,connected by short linkers. The Mecp2 gene has four exons, withtranscripts alternatively spliced to produce two isoforms that differonly at the extreme N-termini³⁰. To maintain the Mecp2 gene structure inthe knock-in mice, the constructs retained exons 1 and 2 as well as thefirst 10 bp of exon 3 (splice acceptor site), resulting in the inclusionof 29 and 12 N-terminal amino acids for isoforms e1 and e2, respectively(FIGS. 2A-B, 3, 4). A C-terminal EGFP tag was added to facilitatedetection and recovery, as tagging does not affect MeCP2 function inmice¹⁶ (FIG. 1B). Taking into account mapped binding sites, structuralinformation and evolutionary conservation, we encompassed the MBD asresidues 72-173 and the NID as residues 272-312 (FIG. S1C-D). Theproportion of native MeCP2 protein sequence retained in ΔN, ΔNC and ΔNICis 88%, 52% and 32% of wild-type, respectively.

We first tested whether the shortened MeCP2 proteins retained theability to interact with methylated DNA and the NCoR/SMRT co-repressorcomplex using cell culture-based assays. All three protein derivativesimmunoprecipitated endogenous NCoR/SMRT complex components whenoverexpressed in HeLa cells, whereas this interaction was abolished inthe negative control NID mutant, R306C (FIG. 1C). To assay mCpG binding,we asked whether expressed proteins localised to mCpG-rich pericentricheterochromatic foci in mouse fibroblasts. Previous work establishedthat localisation of wild-type MeCP2 to these foci is dependent on bothDNA methylation^(31,32) and MBD functionality³³. All three shortenedversions of MeCP2 localised to heterochromatic foci, whereas a negativecontrol MBD mutant (R111G) showed a diffuse nuclear distribution (FIG.1D). To determine whether the shortened proteins could bind chromatinand the NCoR/SMRT complex simultaneously, we asked if they were able torecruit TBL1X, an NCoR/SMRT subunit that binds directly to MeCP2⁴, toheterochromatin. Over-expressed TBL1X-mCherry lacks an NLS and istherefore cytoplasmic, but in the presence of over-expressed MeCP2 it isefficiently recruited to heterochromatic foci⁴. All shortened MeCP2proteins likewise recruited TBL1X to the heterochromatic foci,demonstrating their ability to bridge DNA with the co-repressor (FIG.1E). The MeCP2 NID mutant control (R306C) itself localised correctly,but as described previously⁴ was unable to relocate TBL1X from thecytoplasm (FIG. 1E). These three assays confirm that all shortenedproteins retain the ability to bind methylated DNA and the NCoR/SMRTcomplex and form a bridge between them.

We initially generated ΔN and ΔNC knock-in mice by replacing theendogenous Mecp2 allele in ES cells followed by blastocyst injection andgerm line transmission (FIG. 3). These truncated proteins were expressedat approximately wild-type levels in whole brain and in neurons asdetermined by western blot and flow cytometry analyses (FIG. 5A-B). Toassess the phenotype of these truncations, knock-in mice were crossedonto a C57BLJ6J background and cohorts underwent weekly phenotypicscoring^(24,27) or behavioural analysis. Both ΔN and ΔNC hemizygous malemice were viable, fertile and showed phenotypic scores indistinguishablefrom their wild-type littermates over the course of a year (FIG. 6A-D).ΔN mice had no body weight phenotype (FIG. 7A), whereas ΔNC micedisplayed a slight increase in weight compared to wild-type littermates(FIG. 7B, repeated measures ANOVA p<0.0001). The weight difference wasabsent in a more outbred (75% C57BL/6J) cohort of ΔNC mice (FIG. 7C),consistent with previous observations that body weight phenotypes in RTTmodels are affected by genetic background²⁶.

At 20 weeks of age, separate cohorts were tested for behaviours commonlyreported in RTT models: hypoactivity, decreased anxiety and reducedmotor abilities. No activity phenotype (analysed by total distancetravelled in the Open Field test) was detected for either the ΔN or ΔNCmice (FIG. 8). No anxiety phenotype (analysed by increased time spent inthe open arms of the Elevated Plus Maze) was detected for either novelmouse line (FIG. 6E). The ΔNC mice did, however, spend significantlymore time than their wild-type littermates in the central square of theOpen Field arena (FIG. 6F), indicative of mildly decreased anxiety.Motor coordination was assessed using the Accelerating Rotarod test overthree days. Whereas mouse models of RTT show impaired performance inthis test that is most striking on the third day^(34,16), ΔN and ΔNCmice were not significantly different from their wild-type littermateson any of the three days (FIG. 6G). Overall, the results suggest thatcontributions of the N- and C-terminal domains to MeCP2 function are atbest subtle. This result is particularly remarkable given RTT-likesymptoms in male mice expressing a slightly more severe C-terminaltruncation, which lacks residues beyond T308³⁵. The difference inphenotype may be explained by retention of full NID function in ΔNCmice, as previous evidence indicates that loss of the further fourC-terminal amino acids (309-312) reduces the affinity of this truncatedMeCP2 molecule for the NCoR/SMRT co-repressor complex⁴.

We next replaced the endogenous Mecp2 gene with ΔNIC, the minimalallele, containing only the MBD and NID domains and comprising 32% ofthe full-length protein sequence (FIGS. 1B, 4). Protein levels in wholebrain were quantified by western blotting and flow cytometry, both ofwhich showed reduced abundance (˜50% of WT-EGFP controls; FIG. 9A-B). Asimilar reduction in protein abundance was also seen in the neuronalsubpopulation (˜40% of WT-EGFP controls; FIG. 9B). Low protein levelswere not due to transcriptional silencing, as mRNA was in fact moreabundant in ΔNIC mice than in wild-type littermates (FIG. 9C),suggesting that deletion of the intermediate region compromises proteinstability. Despite low protein levels, male ΔNIC mice had a normallifespan (FIGS. 9E, 10). Phenotypic scoring over one year detected mildneurological phenotypes (FIG. 9D), predominantly gait abnormalities andpartial hind-limb clasping. These symptoms persisted throughout thescoring period, but did not become more severe. ΔNIC mice also weighed˜40% less than their wild-type siblings (FIG. 11A; repeated measuresΔNOVA p<0.0001). As seen in this study, both increases and decreases inbody weight have been previously reported in MeCP2-mutant mousemodels^(26,36,23,16). Behavioural analysis of a separate cohort at 20weeks showed decreased anxiety in male ΔNIC mice, as evidenced by thesignificantly reduced time spent in the closed arms of the Elevated PlusMaze (FIG. 9F, KS test p=0.003). This result was not supported by theOpen Field test (FIG. 9G), which also detected no activity phenotype(FIG. 12). Consistent with the gait defects detected in the scoringcohort, ΔNIC mice had reduced motor coordination, shown by decliningperformance over three daily trials on the Accelerating Rotarod (FIG.9H, Friedman test p=0.003). This resulted in significantly impairedperformance on the third day of testing compared to wild-typelittermates (KS test p=0.003). Overall, it is noteworthy that ΔNICanimals are much less severely affected than male mice with the mildestcommon mutation found in RTT patients, R133C, which had a medianlifespan of 42 weeks, higher symptomatic scores and a stronger reducedweight phenotype¹⁶ (FIG. 13). This result strongly supports ourhypothesis that recruitment of the NCoR/SMRT co-repressor complex tochromatin is the primary function of MeCP2, with the mild phenotypeobserved being a likely consequence of reduced protein levels, aspreviously described for hypomorphic mice that express full-length MeCP2at 50% of wild-type levels³⁷.

To further test the functionality of minimal MeCP2, we asked whetherlate provision of ΔNIC via genetic reactivation could reverse phenotypicdefects in symptomatic MeCP2-deficient mice, as has previously beenshown for the full-length protein²⁴. We generated null-likeMeCP2-deficient mice by preventing ΔNIC expression with a floxedtranscriptional STOP cassette in intron 2 (FIGS. 4, 14). These mice werecrossed with mice carrying a CreER transgene (Cre recombinase fused to amodified estrogen receptor) to enable reactivation upon Tamoxifentreatment. This was induced after the onset of symptoms in STOP CreERmice (FIG. 15A), resulting in high levels of Cre recombination (FIG.16A) and protein levels similar to ΔNIC mice (FIG. 16B). ΔNIC genereactivation had a dramatic effect on phenotypic progression,ameliorating neurological symptoms and restoring normal survival (FIG.15B-C). In contrast, STOP mice lacking the CreER transgene failed tosurvive beyond 26 weeks. Thus, despite its radically reduced length andrelatively low abundance, ΔNIC was able to effectively reverse theRTT-like phenotype in MeCP2-deficient mice.

This finding prompted us to explore whether ΔNIC could be used for genetherapy, which we tested in Mecp2-null mice. The ΔNIC gene, driven by aminimal Mecp2 promoter, was tagged with a Myc epitope (in place of muchlarger EGFP) and packaged into an adeno-associated viral vector (AAV9).Neonatal mice (P1-2) were injected intra-cranially with this virus orthe AAV vehicle alone (FIG. 15D). Mecp2-null animals receiving the ΔNICgene showed greatly reduced symptom severity and enhanced survival (FIG.15E-F). Despite the lack of fine control over infection rate per braincell, we did not observe deleterious effects due to over-expression,even in wild-type animals (FIG. 17). It is conceivable that toxicity ismitigated by the moderate instability and/or reduced activity of ΔNICprotein. This experiment also shows that ΔNIC protein is functionalwithout the large EGFP tag. The use of minimal MeCP2 could provide atherapeutic advantage due to the restricted capacity of AAV vectors.Shortening the coding sequence creates room for additional regulatorysequences, enabling better control of expression levels.

3. Discussion

Overall our results argue against the view that MeCP2 functions as amultifunctional hub, and instead support a simpler model whereby itspredominant function is to recruit the NCoR/SMRT co-repressor complex tomethylated sites on chromatin. It is noteworthy that the minimal MeCP2protein (ΔΔNIC) is missing all or part of several domains that have beenhighlighted as potentially important, including the AT-hooks²³, severalactivity-dependent phosphorylation sites^(17,18), an RNA binding motif ⁶and interaction sites for proteins implicated in micro-RNA processing⁹,splicing¹⁰ and chromatin remodelling⁸. Importantly, our discovery thatthese two domains are sufficient to restore neuronal function toMeCP2-deficient mice has allowed us to show the therapeutic potential ofthe minimal protein.

4. Additional Experiment

The appearance of toxicity in the form of motor dysfunction, ataxia andapparent loss of proprioception when full length Mecp2/MECP2 isdelivered to mice has been reported previously^(49,50). An independentreport has recently shown an identical stereotyped ataxia and loss ofproprioception in response to delivery of the AAV9 variant (AAVhu68) inlarger mammalian species and has shown that the peripheral nerve dorsalroot ganglia may be especially susceptible to AAV9 variant dosing⁵¹.

We have performed a direct comparison of full length MECP2 and the ΔNICMECP2 (Table 3). We observed a significant reduction of ataxia andproprioception dysfunction in the AAV9 ΔNIC MECP2-treated animalscompared to mice treated with full length MECP2. These data support thefact that, under identical conditions, the ΔNIC MECP2 minigene confersreduced susceptibility to known peripheral neurotoxicity compared tofull length MeCP2.

TABLE 3 Comparison of full length MECP2 and the ΔNIC MECP2 AAV9-MeP229/MECP2 refers to a AAV2/9 vector having the strong 229 bp fragmentof endogenous Mecp2 promoter and full length MECP2; AAV9-MeP426/MECP2refers to a AAV2/9 vector having the 426 bp fragment of endogenous Mecp2promoter and full length MECP2; AAV9-MeP426/ΔNIC MECP2 refers to aAAV2/9 vector having the 426 bp fragment of endogenous Mecp2 promoterand the ΔNIC MECP2 minigene insert. Vector Toxicity AAV9-MeP229/MECP2Severe ataxia/loss of proprioception in 100% of treated miceAAV9-MeP426/MECP2 Mild ataxia/loss of proprioception/clasping in 100% oftreated mice AAV9-MeP426/ΔNIC Mild ataxia/loss ofproprioception/clasping MECP2 in <25% (4 of 17) of treated mice

REFERENCES

1. Kinde, B. et al. DNA methylation in the gene body influencesMeCP2-mediated gene repression. Proc. Natl. Acad. Sci. U.S.A. 113,15114-15119 (2016)

2. Adams, V H et al. Intrinsic Disorder and Autonomous Domain Functionin the Multifunctional Nuclear Protein, MeCP2. J. Biol. Chem. 282,15057-64 (2007)

3. Lyst, M. J. & Bird, A. Rett syndrome: a complex disorder with simpleroots. Nat. Rev. Genet. 16, 261-274 (2015).

4. Lyst, M. J. et al. Rett syndrome mutations abolish the interaction ofMeCP2 with the NCoR/SMRT co-repressor. Nat. Neurosci. 16, 898-902(2013).

5. Chahrour, M. et al. MeCP2, a key contributor to neurological disease,activates and represses transcription. Science 320, 1224-9 (2008).

6. Jeffery, L. & Nakielny, S. Components of the DNA methylation systemof chromatin control are RNA-binding proteins. J. Biol. Chem. 279,49479-49487 (2004).

7. Nan, X. et al. Interaction between chromatin proteins MECP2 and ATRXis disrupted by mutations that cause inherited mental retardation. Proc.Natl. Acad. Sci. U.S.A. 104, 2709-14 (2007).

8. Agarwal, N. et al. MeCP2 interacts with HP1 and modulates itsheterochromatin association during myogenic differentiation. NucleicAcids Res. 35, 5402-8 (2007).

9. Cheng, T.-L. et al. MeCP2 suppresses nuclear microRNA processing anddendritic growth by regulating the DGCR8/Drosha complex. Dev. Cell 28,547-60 (2014).

10. Young, J. I. et al. Regulation of RNA splicing by themethylation-dependent transcriptional repressor methyl-CpG bindingprotein 2. Proc. Natl. Acad. Sci. U.S.A. 102, 17551-8 (2005).

11. Ragione, F. Della, Vacca, M., Fioriniello, S., Pepe, G. & Esposito,M. D. MECP2 , a multi-talented modulator of chromatin architecture.Brief. Funct. Genomics 15, 1-12 (2016).

12. Nan, X., Meehan, R. R. & Bird, A. Dissection of the methyl-CpGbinding domain from the chromosomal protein MeCP2. Nucleic Acids Res.21, 4886-4892 (1993).

13. RettBase: Rett Syndrome Variation Database. at<http://mecp2.chw.edu.au/>

14. Exome Aggregation Consortium (ExAC), Cambridge, Mass. athttp://exac.broadinstitute.org

15. Dinca, A et al Intracellular delivery of proteins withcell-penetrating peptides for therapeutic uses in human disease. Int JMol Sci. 17(2): 263 (2016).

16. Brown, K. et al. The molecular basis of variable phenotypic severityamong common missense mutations causing Rett syndrome. Hum. Mol. Genet.25, 558-570 (2016).

17. Zhou, Z. et al. Brain-Specific Phosphorylation of MeCP2 RegulatesActivity-Dependent Bdnf Transcription, Dendritic Growth, and SpineMaturation. Neuron 52, 255-269 (2006).

18. Tao, J. et al. Phosphorylation of MeCP2 at Serine 80 regulates itschromatin association and neurological function. Proc. Natl. Acad. Sci.U.S.A. 106, 4882-7 (2009).

19. Ebert, D. H. et al. Activity-dependent phosphorylation of MeCP2threonine 308 regulates interaction with NCoR. Nature 499, 341-5 (2013).

20. Ho, K. L. et al. MeCP2 binding to DNA depends upon hydration atmethyl-CpG. Mol. Cell 29, 525-31 (2008).

21. PHD Secondary structure prediction method. at<https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_phd.html>

22. Lyst, M. J., Connelly, J., Merusi, C. & Bird, A. Sequence-specificDNA binding by AT-hook motifs in MeCP2. FEBS Lett. 590, 2927-2933(2016).

23. Baker, S. A. et al. An AT-hook domain in MeCP2 determines theclinical course of Rett syndrome and related disorders. Cell 152, 984-96(2013).

24. Guy, J., Gan, J., Selfridge, J., Cobb, S. & Bird, A. Reversal ofneurological defects in a mouse model of Rett syndrome. Science 315,1143-7 (2007).

25. Cong, L. et al. Multiplex Genome Engineering Using CRISPR/VCasSystems. Science (80-.). 339,819-823 (2013).

26. Guy, J., Hendrich, B., Holmes, M., Martin, J. E. & Bird, a. A mouseMecp2-null mutation causes neurological symptoms that mimic Rettsyndrome. Nat. Genet. 27, 322-6 (2001).

27. Cheval, H. et al. Postnatal inactivation reveals enhancedrequirement for MeCP2 at distinct age windows. Hum. Mol. Genet. 21,3806-3814 (2012).

28. Clément, N. & Grieger, J. C. Manufacturing of recombinantadeno-associated viral vectors for clinical trials. Mol. Ther. MethodsClin. Dev. 3, 16002 (2016).

29. Gadalla, K. K. E. et al. Improved survival and reduced phenotypicseverity following AAV9/MECP2 gene transfer to neonatal and juvenilemale Mecp2 knockout mice. Mol. Ther. 21, 18-30 (2013).

30. Kriaucionis, S. & Bird, A. The major form of MeCP2 has a novelN-terminus generated by alternative splicing. Nucleic Acids Res. 32,1818-23 (2004).

31. Lewis, J. D. et al. Purification, sequence, and cellularlocalization of a novel chromosomal protein that binds to methylatedDNA. Cell 69, 905-14 (1992).

32. Nan, X., Tate, P., Li, E. & Bird, A. DNA methylation specifieschromosomal localization of MeCP2. Mol. Cell. Biol. 16, 414-21 (1996).

33. Kudo, S. et al. Heterogeneity in residual function of MeCP2 carryingmissense mutations in the methyl CpG binding domain. J. Med. Genet. 40,487-93 (2003).

34. Goffin, D. et al. Rett syndrome mutation MeCP2 T158A disrupts DNAbinding, protein stability and ERP responses. Nat. Neurosci. 15, 274-83(2012).

35. Shahbazian, M. et al. Mice with truncated MeCP2 recapitulate manyRett syndrome features and display hyperacetylation of histone H3.Neuron 35, 243-54 (2002).

36. Chen, R. Z., Akbarian, S., Tudor, M. & Jaenisch, R. Deficiency ofmethyl-CpG binding protein-2 in CNS neurons results in a Rett-likephenotype in mice. Nat. Genet. 27, 327-331 (2001).

37. Samaco, R. C. et al. A partial loss of function allele ofMethyl-CpG-binding protein 2 predicts a human neurodevelopmentalsyndrome. Hum. Mol. Genet. 17, 1718-1727 (2008).

38. Gray, S J, Foti, S B, Schwartz, J W, Bachaboina, L, Taylor-Blake, B,Coleman, J, et al. (2011). Optimizing promoters for recombinantadeno-associated virus-mediated gene expression in the peripheral andcentral nervous system using self-complementary vectors. Hum Gene Ther22: 1143-1153.

39. Feng, Y, Huang, W, Wani, M, Yu, X, and Ashraf, M (2014). Ischemicpreconditioning potentiates the protective effect of stem cells throughsecretion of exosomes by targeting Mecp2 via miR-22. PLoS One 9: e88685.

40. Jovicic, A, Roshan, R, Moisoi, N, Pradervand, S, Moser, R, Pillai,B, et al. (2013). Comprehensive expression analyses of neuralcell-type-specific miRNAs identify new determinants of the specificationand maintenance of neuronal phenotypes. J Neurosci 33: 5127-5137.

41. Klein, M E, Lioy, D T, Ma, L, Impey, S, Mandel, G, and Goodman, R H(2007). Homeostatic regulation of MeCP2 expression by a CREB-inducedmicroRNA. Nat Neurosci 10: 1513-1514.

42. Liu, J, and Francke, U (2006). Identification of cis-regulatoryelements for MECP2 expression. Human molecular genetics 15: 1769-1782.

43. Adachi, M, Keefer, E W, and Jones, F S (2005). A segment of theMecp2 promoter is sufficient to drive expression in neurons. Humanmolecular genetics 14: 3709-3722.

44. Liyanage, V R, Zachariah, R M, and Rastegar, M (2013). Decitabinealters the expression of Mecp2 isoforms via dynamic DNA methylation atthe Mecp2 regulatory elements in neural stem cells. Molecular autism 4:46.

45. Visvanathan, J, Lee, S, Lee, B, Lee, J W, and Lee, S K (2007). ThemicroRNA miR-124 antagonizes the anti-neural REST/SCP1 pathway duringembryonic CNS development. Genes Dev 21: 744-749.

46. Coy, J F, Sedlacek, Z, Bachner, D, Delius, H, and Poustka, A (1999).A complex pattern of evolutionary conservation and alternativepolyadenylation within the long 3″-untranslated region of themethyl-CpG-binding protein 2 gene (MeCP2) suggests a regulatory role ingene expression. Human molecular genetics 8: 1253-1262.

47. Bagga, J S, and D'Antonio, L A (2013). Role of conservedcis-regulatory elements in the post-transcriptional regulation of thehuman MECP2 gene involved in autism. Human genomics 7: 19.

48. Newnham, C M, Hall-Pogar, T, Liang, S, Wu, J, Tian, B, Hu, J, et al.(2010). Alternative polyadenylation of MeCP2: Influence of cis-actingelements and trans-acting factors. RNA biology 7: 361-372.

49. Gadalla, K. (2012) Virus-mediated delivery of MECP2 as a potentialtool for the treatment of Rett syndrome. PhD thesis,http://theses.gla.ac.uk/id/eprint/3501

50. Gadalla, K. K. E., Vudhironarit, T., Hector, R. D., Sinnett, S.,Bahey, N. G., Bailey, M. E. S., Gray, S. J., Cobb, S. R. (2017)Development of a Novel AAV Gene Therapy Cassette with Improved SafetyFeatures and Efficacy in a Mouse Model of Rett Syndrome. Mol TherMethods Clin Dev. 5 :180-190. doi: 10.1016/j.omtm.2017.04.007.

51. Hinderer, C., Katz, N., Buza, E. L., Dyer, C., Goode, T., Bell, P.,Richman, L. K., Wilson, J. M. (2018) Severe Toxicity in NonhumanPrimates and Piglets Following High-Dose Intravenous Administration ofan Adeno-Associated Virus Vector Expressing Human SMN. Hum Gene Ther.doi: 10.1089/hum.2018.015.

1. A synthetic polypeptide comprising: i) an MBD amino acid sequenceshowing at least 70% similarity with the amino acid sequence as depictedin SEQ ID NO: 1; and ii) an NID amino acid sequence showing at least 70%similarity with the amino acid sequence as depicted in SEQ ID NO: 2,wherein the polypeptide has a deletion of at least 50 amino acids, whencompared to the full length MeCP2 e1 and e2 sequences (SEQ ID Nos 3 and4).
 2. A synthetic polypeptide according to claim 1 wherein thepolypeptide has less than 90% identity over the entire length of theamino acid sequences of MeCP2 as depicted in SEQ ID NO: 3 and SEQ ID NO:4.
 3. A synthetic polypeptide according to claim 1 or claim 2, havingthe structure: A-B-C-D-E wherein portion B of the synthetic polypeptideis said MBD amino acid sequence, and portion D of the syntheticpolypeptide is said NID amino acid sequence, and further wherein:portion A of the synthetic polypeptide is less than 40 amino acids longand/or has less than 80% identity to the amino acid sequences asdepicted in SEQ ID NOs:5 and 6, calculated over the entire length of theamino acid sequences as depicted in SEQ ID NOs: 5 and 6; portion C ofthe synthetic polypeptide is less than 20 amino acids long and/or hasless than 80% identity to the amino acid sequence as depicted in SEQ IDNO: 7, calculated over the entire length of the amino acid sequence asdepicted in SEQ ID NO: 7; and/or portion E of the synthetic polypeptideis absent, a protein tag, and/or has less than 80% identity to the aminoacid sequence as depicted in SEQ ID NO: 8, calculated over the entirelength of the amino acid sequence as depicted in SEQ ID NO:
 8. 4. Asynthetic polypeptide according to any of claims 1 to 3 wherein saidsynthetic polypeptide is capable of recruiting a NCoR/SMRT co-repressorcomplex component, such as NCoR/SMRT, HDAC3, GPS2, TBL1X or TBLR1,preferably TBL1X or TBLR1, to methylated DNA.
 5. A synthetic polypeptideaccording to any preceding claim wherein said synthetic polypeptideconsists of less than 430 amino acids, preferably less than 400, 350,320, 270, or 200 amino acids, and further preferably less than 180 aminoacids.
 6. A synthetic polypeptide according to any preceding claimwherein said polypeptide comprises a nuclear localization signal (NLS),preferably wherein said NLS is comprised within the amino acid sequencebetween the MBD and NID.
 7. A synthetic polypeptide according to anypreceding claim wherein the amino acid sequence between the MBD and NIDamino acid sequences has less than 75% identity to the amino acidsequence as depicted in SEQ ID NO: 7, calculated over the entire lengthof the amino acid sequence as depicted in SEQ ID NO: 7, preferably lessthan 50%, and further preferably less than 30% identity.
 8. A syntheticpolypeptide according to any preceding claim wherein the amino acidsequence between the MBD and NID amino acid sequences is less than 50amino acids long, preferably less than 30 amino acids long, and furtherpreferably less than 20 amino acids long.
 9. A synthetic polypeptideaccording to any preceding claim wherein the amino acid sequence betweenthe MBD and NID amino acid sequences has a substitution or deletion ofat least 10 consecutive amino acids compared to the amino acid sequencefrom position 207 to position 271 of the full length human wild typeMeCP2 polypeptide sequence (e2 isoform) as shown in SEQ ID NO:
 4. 10. Asynthetic polypeptide according to any preceding claim wherein the aminoacid sequence adjacent to the carboxy end of the NID amino acid sequencehas less than 75% identity to the amino acid sequence as depicted in SEQID NO: 8, calculated over the entire length of the amino acid sequenceas depicted in SEQ ID NO: 8, preferably less than 50%, and furtherpreferably less than 30% identity.
 11. A synthetic polypeptide accordingto any preceding claim wherein the amino acid sequence adjacent to thecarboxy end of the NID amino acid sequence is less than 50 amino acidslong, preferably less than 30 amino acids long or less than 20 aminoacids long, and further preferably wherein there is no amino acidsequence adjacent to the carboxy end of the NID amino acid sequence. 12.A synthetic polypeptide according to any preceding claim wherein theamino acid sequence adjacent to the amino end of the MBD amino acidsequence has less than 75% identity to the amino acid sequences asdepicted in SEQ ID NOs: 5 and 6, calculated over the entire length ofthe amino acid sequences as depicted in SEQ ID NOs: 5 and 6, preferablyless than 50%, and further preferably less than 30% identity.
 13. Asynthetic polypeptide according to any preceding claim wherein the aminoacid sequence adjacent to the amino end of the MBD amino acid sequenceis less than 50 amino acids long, preferably less than 30 amino acidslong or less than 20 amino acids long, and further preferably less than10 amino acids long.
 14. A synthetic polypeptide according to anypreceding claim wherein the polypeptide has less than 90% identity overthe entire length of the amino acid sequences of MeCP2 as depicted inSEQ ID NO: 3 and SEQ ID NO: 4, preferably less than 80% identity, lessthan 70% identity, or less than 60% identity, and further preferablyless than 40% identity.
 15. A nucleic acid construct that encodes apolypeptide according to any preceding claim.
 16. An expression vectorcomprising a nucleotide sequence encoding a synthetic polypeptideaccording to any of claims 1 to
 14. 17. An expression vector accordingto claim 16 further comprising one or more control elements selectedfrom: a promoter for expression of the nucleotide sequence in neuronalcells, for example an Mecp2 or MECP2 promoter, one or more downstreammiR binding sites from the MECP2 or Mecp2 3′UTR, and an AU-rich element.18. An expression vector according to claim 16 or claim 17 which is aviral vector, such as a retroviral vector, an adenoviral vector, anadeno-associated viral vector, or an alphaviral vector.
 19. A virioncomprising a vector according to claim
 18. 20. A pharmaceuticalcomposition comprising a synthetic polypeptide according to any ofclaims 1 to 14, a nucleic acid construct according to claim 15, anexpression vector according to any of claims 16 to 18 and/or a virionaccording to claim
 19. 21. A cell comprising a synthetic geneticconstruct adapted to express a polypeptide according to any of claims 1to
 14. 22. A cell according to claim 21 comprising a vector according toany of claims 16 to
 18. 23. A cell according to claim 21 or 22 forproducing a virion according to claim
 19. 24. A method of treating orpreventing disease in an animal comprising administering to said animala synthetic polypeptide according to any of claims 1 to
 14. 25. A methodaccording to claim 24 wherein said disease is a neurological disorderassociated with inactivating mutation of MeCP2, for example Rettsyndrome.
 26. A method according to claim 24 or 25, wherein saidadministering comprises administering a composition comprising asynthetic polypeptide according to any of claims 1 to 14, a nucleic acidconstruct according to claim 15, an expression vector according to anyof claims 16 to 18, a virion according to claim 19 and/or apharmaceutical composition according to claim
 20. 27. A syntheticpolypeptide according to any of claims 1 to 14, a nucleic acid constructaccording to claim 15, an expression vector according to any of claims16 to 18, a virion according to claim 19 and/or a pharmaceuticalcomposition according to claim 20 for the treatment or prevention of aneurological disorder associated with inactivating mutation of MeCP2,for example Rett syndrome.
 28. The use of a synthetic polypeptideaccording to any of claims 1 to 14, a nucleic acid construct accordingto claim 15, an expression vector according to any of claims 16 to 18, avirion according to claim 19 and/or a pharmaceutical compositionaccording to claim 20 in the manufacture of a medicament for thetreatment or prevention of a neurological disorder associated withinactivating mutation of MeCP2, for example Rett syndrome.